Microservices isn’t just about splitting up a monolith. This post documents 5 real pitfalls I hit in Go microservices projects, and how to avoid them.
1. Deprecated go get Breaking CI Builds
1.1 The Problem
One day, CI suddenly failed:
1
2
3
| go get -u github.com/golang/protobuf/protoc-gen-go
# go: go get -u github.com/golang/protobuf/protoc-gen-go:
# installing executables with 'go get' in module mode is deprecated.
|
1.2 Root Cause
Starting with Go 1.17, go get behavior changed:
| Version | go get Behavior |
|---|
| 1.16 and earlier | Downloads dependencies + installs binaries |
| 1.17+ | Only manages go.mod dependencies |
| Future versions | -d flag becomes default |
1.3 Solution
1
2
3
4
5
6
| # Old way (deprecated)
go get -u github.com/golang/protobuf/protoc-gen-go
# New way: use go install + version
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
|
1.4 CI Script Fix
1
2
3
4
5
6
7
8
| # .github/workflows/build.yml
jobs:
build:
steps:
- name: Install protoc plugins
run: |
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.31.0
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.3.0
|
Lesson: Pin tool versions, don’t use @latest.
2. gRPC Version Conflict Causing Runtime Panic
2.1 The Problem
Service crashes on startup:
1
| panic: proto: extension number 1001 is already registered
|
2.2 Root Cause
Project depends on two protobuf library versions:
1
2
3
| go mod graph | grep protobuf
# github.com/golang/protobuf@v1.4.3
# google.golang.org/protobuf@v1.31.0
|
github.com/golang/protobuf is the old library, google.golang.org/protobuf is the new one. Both register the same extension numbers internally, causing conflicts.
2.3 Solutions
Option 1: Force unified version
1
2
3
4
5
6
| // go.mod
require (
google.golang.org/protobuf v1.31.0
)
replace github.com/golang/protobuf => google.golang.org/protobuf v1.31.0
|
Option 2: Upgrade all dependencies
1
2
| go get -u ./...
go mod tidy
|
Option 3: Use go mod why to trace
1
2
| # Find who's pulling in the old version
go mod why github.com/golang/protobuf
|
Lesson: Run go mod tidy regularly to keep your dependency graph clean.
3. Service Discovery Failure: DNS Resolution Timeout
3.1 The Problem
Intermittent timeouts on service-to-service calls, logs show:
1
| context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
3.2 Debugging
1
2
3
4
5
| // Problem code
conn, err := grpc.Dial(
"user-service:8080", // Using K8s Service name
grpc.WithInsecure(),
)
|
tcpdump showed DNS resolution occasionally taking 5+ seconds.
3.3 Root Cause
K8s cluster’s CoreDNS was configured with upstream external DNS. When internal resolution fails, it tries external resolution, causing delays.
3.4 Solutions
Option 1: Use FQDN
1
2
3
4
5
| // Specify full domain name, avoid search domain probing
conn, err := grpc.Dial(
"user-service.default.svc.cluster.local:8080",
grpc.WithInsecure(),
)
|
Option 2: K8s DNS config optimization
1
2
3
4
5
| # In Pod spec
dnsConfig:
options:
- name: ndots
value: "2" # Reduce DNS search attempts
|
Option 3: Client-side DNS caching
1
2
3
4
5
6
| import "google.golang.org/grpc/resolver"
func init() {
// Use passthrough resolver, bypass gRPC's DNS resolution
resolver.SetDefaultScheme("passthrough")
}
|
Lesson: Microservice network issues often aren’t in the code layer.
4. Context Leak Causing Goroutine Explosion
4.1 The Problem
After running for a while, memory keeps growing. pprof shows tons of goroutines waiting:
1
2
| go tool pprof http://localhost:6060/debug/pprof/goroutine
# 50000 goroutines, 90% in select {}
|
4.2 Problem Code
1
2
3
4
5
6
7
8
9
10
11
12
13
| func HandleRequest(ctx context.Context, req *Request) {
// Created new context, but didn't cancel
newCtx, _ := context.WithTimeout(ctx, 10*time.Second)
go func() {
// This goroutine waits on newCtx.Done()
// But if request returns early, newCtx never gets cancelled
<-newCtx.Done()
cleanup()
}()
// ... handle request
}
|
4.3 Correct Approach
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| func HandleRequest(ctx context.Context, req *Request) {
newCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel() // Key: ensure context is cancelled
done := make(chan struct{})
go func() {
defer close(done)
// Async work
}()
select {
case <-done:
// Normal completion
case <-newCtx.Done():
// Timeout
}
}
|
1
2
3
4
5
6
| // Use goleak to detect goroutine leaks
import "go.uber.org/goleak"
func TestMain(m *testing.M) {
goleak.VerifyTestMain(m)
}
|
Lesson: The cancel function returned by context.WithCancel/Timeout must be called.
5. Graceful Shutdown: Killing Requests Mid-Flight
5.1 The Problem
During service restarts, users report failed requests. Logs show:
5.2 Problem Code
1
2
3
4
5
6
7
8
9
10
| func main() {
srv := &http.Server{Addr: ":8080"}
go srv.ListenAndServe()
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM)
<-quit
srv.Close() // Closes immediately, doesn't wait for requests!
}
|
5.3 Correct Approach
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| func main() {
srv := &http.Server{Addr: ":8080"}
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("listen: %s\n", err)
}
}()
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("Shutting down server...")
// Graceful shutdown: wait up to 30 seconds for existing requests
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Fatal("Server forced to shutdown:", err)
}
log.Println("Server exited")
}
|
5.4 K8s PreStop Hook
1
2
3
4
5
6
| # Pod spec
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
# Give K8s time to update Endpoints, prevent traffic from coming in
|
5.5 gRPC Graceful Shutdown
1
2
3
4
5
6
7
8
9
10
11
12
| func main() {
srv := grpc.NewServer()
// ... register services
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM)
<-quit
// GracefulStop waits for all RPCs to complete
srv.GracefulStop()
}
|
Lesson: Graceful shutdown isn’t just code — you also need K8s preStop and terminationGracePeriodSeconds.
6. Summary
| Problem | Root Cause | Solution |
|---|
| go get deprecated | Go 1.17 behavior change | Use go install @version |
| protobuf panic | Old/new library conflict | Unify versions + go mod tidy |
| DNS timeout | K8s DNS config | Use FQDN + adjust ndots |
| Goroutine leak | Context not cancelled | defer cancel() |
| Request interrupted | No graceful shutdown | srv.Shutdown() + preStop |
Core lesson: Microservice complexity isn’t in the split — it’s in the edge cases that distributed environments bring. Every problem requires thinking across code, config, and infrastructure layers.