Dive into K8s 02: Cilium Replaces Flannel — Goodbye VXLAN Tax, Hello eBPF Native Networking
Last post, we hand-rolled Flannel in Kind cluster and witnessed the 50-byte per-packet VXLAN encapsulation overhead via tcpdump. This time, we replace Flannel with Cilium, leveraging eBPF to completely eliminate this “network tax”, and use Hubble for traffic visualization.
1. Why Switch to Cilium?
1.1 Revisiting VXLAN’s Cost
In the previous post, we captured:
| |
For typical web apps, 50 bytes is negligible. But in AI large model training:
- AllReduce operations sync gradients across multiple GPUs
- Each iteration can generate GB-level network traffic
- Encap/decap CPU overhead consumes precious compute resources
- Latency jitter means the slowest card drags down entire training
1.2 Cilium’s Secret Weapon: eBPF
Cilium uses eBPF (Extended Berkeley Packet Filter) to handle packets directly in kernel space, without going through traditional iptables or VXLAN encapsulation:
| Feature | Flannel (VXLAN) | Cilium (eBPF) |
|---|---|---|
| Encapsulation overhead | 50 bytes/packet | 0 (Direct Routing) |
| NAT implementation | iptables (userspace rules) | eBPF (kernel) |
| Network policies | Depends on kube-proxy | Native support |
| Observability | Needs external tools | Hubble built-in |
2. Environment Prep: Clean Flannel Residue
2.1 Delete Flannel
| |
2.2 Clean Node Network Residue
This step is critical — otherwise it conflicts with Cilium:
| |
War Story: First time I skipped this step, Cilium installed but Pod networking was chaos. cilium status reported “BPF NodePort: Disabled” and Pods couldn’t communicate. Spent hours in logs before discovering cni0 bridge wasn’t cleaned properly.
3. Installing Cilium
3.1 Using Cilium CLI
| |
Key parameters explained:
routingMode=native: Use direct routing instead of encapsulationkubeProxyReplacement=true: Completely replace kube-proxy with eBPFhubble.relay.enabled=true: Enable Hubble observability
3.2 Wait and Verify
| |
Success output:
| |
4. eBPF Packet Capture: Witnessing Zero Encapsulation
4.1 Deploy Test Pods
| |
4.2 Observe Traffic with Hubble
| |
4.3 Compare Capture Results
In Cilium’s Native Routing mode:
| |
Capture result:
| |
Key Findings:
- No UDP encapsulation! Directly seeing Pod IP to Pod IP ICMP packets
- Packet only 84 bytes, compared to Flannel’s 134 bytes
- Encapsulation overhead: 0 bytes
5. Hubble Visualization: Network Topology at a Glance
5.1 Launch Hubble UI
| |
Open browser at http://localhost:12000:
What Hubble UI shows:
- Real-time traffic relationship graph
- Protocol, port, bytes for each connection
- Network policy hit status
- DNS query tracing
5.2 Hubble CLI in Action
| |
6. Performance Comparison Data
Simple iperf3 test on Kind cluster:
| Metric | Flannel (VXLAN) | Cilium (Native) | Improvement |
|---|---|---|---|
| Throughput | 8.2 Gbps | 9.4 Gbps | +15% |
| Latency (P99) | 0.42 ms | 0.31 ms | -26% |
| CPU usage | 12% | 5% | -58% |
Note: Test environment is limited; real production improvements may be greater.
7. Implications for AI Infra
7.1 Why Does AI Training Need Cilium More?
- AllReduce traffic characteristics: Lots of small packets, bursty traffic, latency-sensitive
- Ring/Tree AllReduce topologies make every node a traffic hotspot
- GPU time is precious: Network wait = GPU idle = burning money
7.2 Going Further: RDMA and GPUDirect
Cilium is just the first step. True high-end AI Infra needs:
- RDMA over Converged Ethernet (RoCE): Bypass kernel, direct memory access
- GPUDirect RDMA: GPU directly reads/writes remote GPU memory
Cilium’s Native Routing mode paves the way for these advanced technologies by eliminating Overlay network constraints.
8. Summary
| Step | Takeaway |
|---|---|
| Clean Flannel | Deep understanding of CNI plugin architecture |
| Install Cilium | Master eBPF network mode configuration |
| Compare captures | Quantify elimination of encapsulation overhead |
| Hubble visualization | Gain production-grade network observability |
Next Plan: Deploy PyTorch distributed training on multi-node cluster, compare actual training throughput impact between Flannel and Cilium.
Series
- Previous: Dive into K8s 01 — Hand-Rolling CNI with Kind, From CrashLoop to VXLAN Packet Capture
- Next: Dive into K8s 03 — eBPF Deep Dive: Hand-Writing a Simple CNI Plugin (Planned)