A common pattern in projects: Set an expiration time for Redis cache. When cache misses, fetch from database.
Cache stampede means: A flood of requests bypass cache and hit the database directly.
As shown in the diagram, when cache expires, before step 2 completes, thousands of requests have already reached step 1, putting massive pressure on the server. How do we solve this? Singleflight provides an elegant approach.
Cache Stampede Scenario
The idea: Only allow one request to penetrate, all others queue at step 3 waiting.
packagesingleflightimport"sync"typecallstruct{wgsync.WaitGroupvalinterface{}errerror}typeGroupstruct{musync.Mutexmmap[uint64]*call// Lazy init}func(g*Group)Do(keyuint64,fnfunc()(interface{},error))(interface{},error){+----------------------------------+|g.mu.Lock()||ifg.m==nil{||g.m=make(map[uint64]*call)||}|// When first request arrives|ifc,ok:=g.m[key];ok{|// Can't get value from map (cache miss)|g.mu.Unlock()+------>// Creates new map entry (pointer)|c.wg.Wait()|// Other requests get this pointer and block|returnc.val,c.err|// Until first request gets value and shares result|}||c:=new(call)||c.wg.Add(1)||g.m[key]=c||g.mu.Unlock()|+----------------------------------++----------------------+|c.val,c.err=fn()|// Only first request executes, gets result once|c.wg.Done()+----->// Result stored in map+----------------------+// Other requests reuse this result+------------------+|g.mu.Lock()|// Only first request reaches here|delete(g.m,key)+---->// After other requests finish reading|g.mu.Unlock()|// Delete map entry (already in Redis, don't need it)+------------------+returnc.val,c.err}
Simulating Cache Stampede
Approach:
Write a server using httprouter, use two maps to simulate Redis and MySQL. Initially only MySQL has data (so when 5k goroutines hit at once, it’s a cache stampede)
To more realistically reproduce cache stampede, sleep 2 seconds before reading from MySQL, ensuring lots of requests hit MySQL when singleflight isn’t used, for better comparison.
// server/main.gopackagemainimport("errors""fmt""log""net/http""sync""time""github.com/julienschmidt/httprouter")vargroup*Groupvarmuxsync.RWMutexvarrwMuxsync.RWMutexvarredisDataBasemap[string]stringvarmysqlDataBasemap[string]stringvarcountRedisHitintvarcountMysqlHitintfuncGetFromRedis(keystring)(string,error){ifdata,ok:=redisDataBase[key];ok{mux.Lock()countRedisHit++mux.Unlock()returndata,nil}ifdata,err:=GetFromMySql(key);err==nil{returndata,nil}else{return"",err}}funcGetFromMySql(keystring)(string,error){time.Sleep(time.Second*1)ifdata,ok:=mysqlDataBase[key];ok{// write to redisrwMux.Lock()redisDataBase[key]="data stored in redis"rwMux.Unlock()// Simulates 2s TTL// Every value stored in redis expires after 2sgofunc(keystring){time.Sleep(time.Second*2)rwMux.Lock()delete(redisDataBase,key)rwMux.Unlock()}(key)mux.Lock()countMysqlHit++mux.Unlock()returndata,nil}else{return"",errors.New("data not found!")}}// Normal versionfuncGetUserInfo(whttp.ResponseWriter,req*http.Request,pshttprouter.Params){queryValues:=req.URL.Query()ifres,err:=GetFromRedis(queryValues.Get("name"));err!=nil{log.Fatal("err:",err)}else{fmt.Fprintf(w,res)}}// Singleflight versionfuncGetUserInfo1(whttp.ResponseWriter,req*http.Request,pshttprouter.Params){queryValues:=req.URL.Query()function:=func()(interface{},error){ifres,err:=GetFromRedis(queryValues.Get("name"));err==nil{returnres,nil}else{return"",err}}// Ah, wasn't sharing the same group, that's why it wasn't workingres,err:=group.Do(uint64(1),function)iferr!=nil{log.Fatal("err from singleflight:",err)}fmt.Fprintf(w,res.(string))}funcShowHitCount(whttp.ResponseWriter,req*http.Request,_httprouter.Params){fmt.Fprintf(w,"redis:%d\nmysql:%d\n",countRedisHit,countMysqlHit)}funcinit(){redisDataBase,mysqlDataBase=map[string]string{},map[string]string{}mysqlDataBase["zqw"]="data stored in mysql"// Initially no data in redisgroup=&Group{}}funchello(whttp.ResponseWriter,req*http.Request,pshttprouter.Params){queryValues:=req.URL.Query()fmt.Fprintf(w,"hello, %s!\n",queryValues.Get("name"))}funcmain(){router:=httprouter.New()router.GET("/",hello)router.GET("/userinfo",GetUserInfo)router.GET("/userinfo1",GetUserInfo1)router.GET("/count",ShowHitCount)log.Fatal(http.ListenAndServe(":9090",router))}
Client
Simple approach (logic in code): concurrent requests, print results, time it
packagemainimport("fmt""io""log""net/http""sync""time")funccurl(strstring)string{resp,err:=client.Get(str)iferr!=nil{log.Println("error:",err)}res,err:=io.ReadAll(resp.Body)deferresp.Body.Close()returnstring(res)}varclienthttp.Clientfuncinit(){client=http.Client{}}funcmain(){timeStampA:=time.Now()deferclient.CloseIdleConnections()varwgsync.WaitGroupfori:=0;i<5000;i++{wg.Add(1)gofunc(){deferwg.Done()// Alternate between these two//curl("http://localhost:9090/userinfo?name=zqw")curl("http://localhost:9090/userinfo1?name=zqw")}()}wg.Wait()res:=curl("http://localhost:9090/count")fmt.Println(res)timeStampB:=time.Now()fmt.Println("Total time: ",timeStampB.Sub(timeStampA).Seconds())}
Results
Without singleflight:
1
2
3
4
redis:0
mysql:5000
Total time: 1.5597763709999999
With singleflight:
1
2
3
4
redis:0
mysql:1 # Other 4999 requests all shared first request's resultTotal time: 1.216015732
Results are reproducible. If you see errors, your OS might have too few open file descriptors. Try lowering concurrency. I’ve already set ulimit -n 8192.
Tried 10k concurrent — results were inconsistent. Tried 100k concurrent — constant panics (IO errors at client, or server crashed).
Conclusion
Singleflight is compact yet powerful. The concept is worth studying. Essentially it shifts pressure from disk access to memory access.
Issues I ran into while coding:
Server crashed frequently (panic on memory access) when writing to Redis (actually writing to map) after MySQL read — fixed by adding locks
Wrote lock instead of unlock, causing deadlock on server
Singleflight variable wasn’t global, was being redefined in each handler function, so it wasn’t working — found the bug after code review
Concurrency above 50k leads to all kinds of weird errors (runtime io panic), total time 50x+ longer, effectively 10x+ performance degradation — gave up