Contents

Go Singleflight Deep Dive: Preventing Cache Stampede with One Line of Code

In high-concurrency scenarios, cache stampede can crush your database. Singleflight solves this with an elegantly simple design.

Code reference:
https://github.com/coredns/coredns/blob/v1.9.1/plugin/pkg/singleflight/singleflight.go

What is Singleflight?

A common pattern in projects:
    Set an expiration time for Redis cache. When cache misses, fetch from database.

Cache stampede means: A flood of requests bypass cache and hit the database directly.

As shown in the diagram, when cache expires, before step 2 completes, thousands of requests have already reached step 1, putting massive pressure on the server. How do we solve this? Singleflight provides an elegant approach.

../images/2112211222.jpg
Cache Stampede Scenario

The idea: Only allow one request to penetrate, all others queue at step 3 waiting.

Here’s the singleflight source code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
package singleflight

import "sync"

type call struct {
    wg  sync.WaitGroup
    val interface{}
    err error
}

type Group struct {
    mu sync.Mutex     
    m  map[uint64]*call // Lazy init
}

func (g *Group) Do(key uint64, fn func() (interface{}, error)) (interface{}, error) {
  +----------------------------------+
  | g.mu.Lock()                      |
  | if g.m == nil {                  |
  |     g.m = make(map[uint64]*call) |
  | }                                |         // When first request arrives
  | if c, ok := g.m[key]; ok {       |         // Can't get value from map (cache miss)
  |     g.mu.Unlock()                +------>  // Creates new map entry (pointer)
  |     c.wg.Wait()                  |         // Other requests get this pointer and block
  |     return c.val, c.err          |         // Until first request gets value and shares result
  | }                                |         
  | c := new(call)                   |
  | c.wg.Add(1)                      |
  | g.m[key] = c                     |
  | g.mu.Unlock()                    |
  +----------------------------------+

  +----------------------+
  | c.val, c.err = fn()  |        // Only first request executes, gets result once
  | c.wg.Done()          +----->  // Result stored in map
  +----------------------+        // Other requests reuse this result

  +------------------+
  | g.mu.Lock()      |       // Only first request reaches here
  | delete(g.m, key) +---->  // After other requests finish reading
  | g.mu.Unlock()    |       // Delete map entry (already in Redis, don't need it)
  +------------------+

    return c.val, c.err
}

Simulating Cache Stampede

Approach:

  1. Write a server using httprouter, use two maps to simulate Redis and MySQL. Initially only MySQL has data (so when 5k goroutines hit at once, it’s a cache stampede)
  2. Client generates 5k concurrent requests, observe results

Server

See code for:

  1. Interface definitions
  2. Simulated data storage

To more realistically reproduce cache stampede, sleep 2 seconds before reading from MySQL, ensuring lots of requests hit MySQL when singleflight isn’t used, for better comparison.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
// server/main.go
package main

import (
    "errors"
    "fmt"
    "log"
    "net/http"
    "sync"
    "time"

    "github.com/julienschmidt/httprouter"
)
var group *Group
var mux sync.RWMutex
var rwMux sync.RWMutex
var redisDataBase map[string]string
var mysqlDataBase map[string]string
var countRedisHit int
var countMysqlHit int

func GetFromRedis(key string) (string, error) {
    if data, ok := redisDataBase[key]; ok {
        mux.Lock()
        countRedisHit++
        mux.Unlock()
        return data, nil
    }
    if data, err := GetFromMySql(key); err == nil {
        return data, nil
    } else {
        return "", err
    }
}

func GetFromMySql(key string) (string, error) {
    time.Sleep(time.Second * 1)
    if data, ok := mysqlDataBase[key]; ok {
        // write to redis
        rwMux.Lock()
        redisDataBase[key] = "data stored in redis"
        rwMux.Unlock()
        // Simulates 2s TTL
        // Every value stored in redis expires after 2s
        go func(key string) {
            time.Sleep(time.Second * 2)
            rwMux.Lock()
            delete(redisDataBase, key)
            rwMux.Unlock()
        }(key)
        mux.Lock()
        countMysqlHit++
        mux.Unlock()
        return data, nil
    } else {
        return "", errors.New("data not found!")
    }
}

// Normal version
func GetUserInfo(w http.ResponseWriter, req *http.Request, ps httprouter.Params) {
    queryValues := req.URL.Query()
    if res, err := GetFromRedis(queryValues.Get("name")); err != nil {
        log.Fatal("err:", err)
    } else {
        fmt.Fprintf(w, res)
    }
}

// Singleflight version
func GetUserInfo1(w http.ResponseWriter, req *http.Request, ps httprouter.Params) {
    queryValues := req.URL.Query()
    function := func() (interface{}, error) {
        if res, err := GetFromRedis(queryValues.Get("name")); err == nil {
            return res, nil
        } else {
            return "", err
        }
    }
    // Ah, wasn't sharing the same group, that's why it wasn't working
    res, err := group.Do(uint64(1), function)
    if err != nil {
        log.Fatal("err from singleflight:", err)
    }
    fmt.Fprintf(w, res.(string))
}

func ShowHitCount(w http.ResponseWriter, req *http.Request, _ httprouter.Params) {
    fmt.Fprintf(w, "redis:%d\nmysql:%d\n", countRedisHit, countMysqlHit)
}

func init() {
    redisDataBase, mysqlDataBase = map[string]string{}, map[string]string{}
    mysqlDataBase["zqw"] = "data stored in mysql" // Initially no data in redis
    group = &Group{}
}

func hello(w http.ResponseWriter, req *http.Request, ps httprouter.Params) {
    queryValues := req.URL.Query()
    fmt.Fprintf(w, "hello, %s!\n", queryValues.Get("name"))
}

func main() {
    router := httprouter.New()
    router.GET("/", hello)
    router.GET("/userinfo", GetUserInfo)
    router.GET("/userinfo1", GetUserInfo1)
    router.GET("/count", ShowHitCount)
    log.Fatal(http.ListenAndServe(":9090", router))
}

Client

Simple approach (logic in code): concurrent requests, print results, time it

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "sync"
    "time"
)

func curl(str string) string {
    resp, err := client.Get(str)
    if err != nil {
        log.Println("error:", err)
    }
    res, err := io.ReadAll(resp.Body)
    defer resp.Body.Close()
    return string(res)
}

var client http.Client
func init() {
    client = http.Client{}
}
func main() {
    timeStampA := time.Now()
    defer client.CloseIdleConnections()
    var wg sync.WaitGroup
    for i := 0; i < 5000; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            // Alternate between these two
            //curl("http://localhost:9090/userinfo?name=zqw")
            curl("http://localhost:9090/userinfo1?name=zqw")
        }()
    }
    wg.Wait()
    res := curl("http://localhost:9090/count")
    fmt.Println(res)
    timeStampB := time.Now()
    fmt.Println("Total time: ", timeStampB.Sub(timeStampA).Seconds())
}

Results

Without singleflight:

1
2
3
4
redis:0
mysql:5000

Total time:  1.5597763709999999

With singleflight:

1
2
3
4
redis:0
mysql:1  # Other 4999 requests all shared first request's result

Total time:  1.216015732

Results are reproducible.
If you see errors, your OS might have too few open file descriptors. Try lowering concurrency.
I’ve already set ulimit -n 8192.

Tried 10k concurrent — results were inconsistent.
Tried 100k concurrent — constant panics (IO errors at client, or server crashed).

Conclusion

Singleflight is compact yet powerful. The concept is worth studying.
Essentially it shifts pressure from disk access to memory access.

Issues I ran into while coding:

  1. Server crashed frequently (panic on memory access) when writing to Redis (actually writing to map) after MySQL read — fixed by adding locks
  2. Wrote lock instead of unlock, causing deadlock on server
  3. Singleflight variable wasn’t global, was being redefined in each handler function, so it wasn’t working — found the bug after code review
  4. Concurrency above 50k leads to all kinds of weird errors (runtime io panic), total time 50x+ longer, effectively 10x+ performance degradation — gave up