Contents

Custom Linux Distro: From Yocto Build to 60% Boot Time Optimization

A key user experience metric for NAS products: how long from pressing power to accessing shared folders? Our target was 30 seconds, but the original took 75 seconds. This post documents the complete journey from 75s to 30s.

1. Why Customize Linux?

1.1 NAS-Specific Requirements

Unlike general servers, NAS devices need:

  • Fast boot: Consumer device, users won’t tolerate server-like boot times
  • Resource constrained: ARM processor + 1GB RAM
  • High stability: 24×7 operation, minimize unnecessary services
  • Hardware-specific: Only need to support specific SoC and peripherals

Generic distros (like Ubuntu Server) include tons of features we don’t need — long boot times, high memory usage.

1.2 Solution Options

OptionProsCons
YoctoFull control, minimalSteep learning curve
BuildrootSimple, lightweightWeak package management
Trimmed UbuntuEasy to startHard to fully slim down

We chose Yocto because we need long-term maintenance and fine control.

2. Boot Time Analysis

2.1 Baseline Measurement

Using systemd-analyze:

1
2
3
4
5
# Check total boot time
systemd-analyze

# Output
Startup finished in 8.5s (kernel) + 66.2s (userspace) = 74.7s

74.7 seconds — way too slow!

2.2 Bottleneck Identification

1
2
# Check each service's boot time (blame = who's at fault)
systemd-analyze blame | head -20

Output:

1
2
3
4
5
6
7
         32.1s network-online.target
         18.5s cloud-init.service
          8.2s snapd.service
          5.3s systemd-journal-flush.service
          4.1s docker.service
          3.8s accounts-daemon.service
          ...

Problems found:

  1. network-online.target waiting for network for 32 seconds
  2. cloud-init is for cloud servers — we don’t need it
  3. snapd also not needed
  4. Many services starting serially

2.3 Dependency Visualization

1
2
# Generate boot timeline chart
systemd-analyze plot > boot-timeline.svg

From the chart:

  • Many services waiting for network-online.target
  • But our core service (NAS file sharing) only needs NIC initialization, not fully “online”

3. Optimization Strategies

3.1 Disable Unnecessary Services

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Disable cloud-init (for cloud servers)
systemctl disable cloud-init.service
systemctl disable cloud-config.service
systemctl disable cloud-final.service

# Disable snapd
systemctl disable snapd.service
systemctl disable snapd.seeded.service

# Disable unneeded account services
systemctl disable accounts-daemon.service
systemctl disable whoopsie.service

Effect: Reduced 30 seconds!

3.2 Optimize Network Wait

Problem: network-online.target defaults to waiting for DHCP to get IP, can take 30+ seconds.

Option 1: Reduce DHCP timeout

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# /etc/systemd/network/eth0.network
[Match]
Name=eth0

[Network]
DHCP=yes

[DHCP]
RouteMetric=100
UseDNS=true
UseMTU=true
Timeout=10  # Was 60 seconds

Option 2: Make core services not depend on network-online

1
2
3
4
5
6
# /lib/systemd/system/nas-agent.service
[Unit]
Description=NAS Agent
# Change to only depend on network.target (NIC init), not network-online.target (got IP)
After=network.target
Wants=network.target

Effect: Another 20 seconds reduced!

3.3 Service Parallelization

Check critical path:

1
systemd-analyze critical-chain nas-agent.service

Output:

1
2
3
4
5
6
nas-agent.service @35.2s
└─multi-user.target @35.1s
  └─docker.service @30.5s +4.5s
    └─containerd.service @25.3s +5.1s
      └─local-fs.target @25.2s
        └─...

Found nas-agent blocked by docker, but they actually have no dependency!

Fix:

1
2
3
4
5
# /lib/systemd/system/nas-agent.service
[Unit]
Description=NAS Agent
After=network.target
# Remove unnecessary ordering dependency

Effect: docker and nas-agent now start in parallel, saves 5 seconds.

3.4 Kernel Optimization

3.4.1 Disable unneeded kernel modules

1
2
# Check loaded modules
lsmod | wc -l  # 150+!

Create module blacklist:

1
2
3
4
5
6
7
# /etc/modprobe.d/blacklist-nas.conf
blacklist bluetooth
blacklist btusb
blacklist snd_hda_intel
blacklist nouveau
blacklist i2c_piix4
# ...

Effect: Kernel boot phase reduced 2 seconds.

3.4.2 Tune initramfs

1
2
# Check initramfs contents
lsinitramfs /boot/initrd.img-$(uname -r) | wc -l  # 300+ files

Slim down initramfs (only keep necessary drivers):

1
2
# /etc/initramfs-tools/initramfs.conf
MODULES=dep  # Only include dependent modules, not all

Regenerate:

1
update-initramfs -u

Effect: initramfs from 50MB to 15MB, load time reduced 3 seconds.

4. Systemd Tuning Tips

4.1 Type=simple vs Type=oneshot

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Fast-starting services use simple (default)
[Service]
Type=simple
ExecStart=/usr/bin/daemon

# Services that need to complete use oneshot
[Service]
Type=oneshot
ExecStart=/usr/bin/init-script
RemainAfterExit=yes

4.2 Lazy Loading: Socket Activation

Not all services need to start immediately. Use socket activation to start on demand:

1
2
3
4
5
6
7
# /etc/systemd/system/nas-api.socket
[Socket]
ListenStream=8080
Accept=false

[Install]
WantedBy=sockets.target
1
2
3
4
# /etc/systemd/system/nas-api.service
[Service]
ExecStart=/usr/bin/nas-api
# Only starts when connection to port 8080 arrives

Effect: 3 fewer services at boot, saves 2 seconds.

4.3 ReadWritePaths Speedup

1
2
3
4
5
# Limit service filesystem access scope, speeds up startup
[Service]
ReadWritePaths=/var/lib/nas-agent
ProtectSystem=strict
ProtectHome=true

Systemd doesn’t need to traverse entire filesystem, starts faster.

5. Final Results

5.1 Before/After Comparison

StageBeforeAfterReduction
Kernel boot8.5s5.5s-3s
initramfs12.0s6.0s-6s
Systemd (network)32.0s8.0s-24s
Systemd (other services)22.0s10.0s-12s
Total74.7s29.5s-60%

5.2 Key Optimizations Summary

OptimizationTime Saved
Disable cloud-init/snapd30s
Reduce network wait20s
Service parallelization5s
Kernel module trimming2s
initramfs trimming6s
Socket activation2s

6. Continuous Monitoring

6.1 CI Integration for Boot Time Testing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#!/bin/bash
# boot-time-check.sh

MAX_BOOT_TIME=35

BOOT_TIME=$(systemd-analyze | grep "userspace" | awk '{print $4}' | sed 's/s//')

if (( $(echo "$BOOT_TIME > $MAX_BOOT_TIME" | bc -l) )); then
    echo "Boot time regression! Current: ${BOOT_TIME}s, Threshold: ${MAX_BOOT_TIME}s"
    exit 1
fi

echo "Boot time normal: ${BOOT_TIME}s"

6.2 New Service Checklist

When adding new services, always check:

  • Can it use socket activation?
  • Does it really need network-online?
  • Can it start in parallel?
  • Are reasonable resource limits set?

7. Summary

TechniqueCore Command/Config
Boot analysissystemd-analyze blame/plot
Dependency chainsystemd-analyze critical-chain
Disable servicessystemctl disable
Network optimizationDHCP Timeout + network.target
ParallelizationCheck and remove unnecessary After=
Kernel trimmingmodprobe blacklist
initramfsMODULES=dep

Core principle: Don’t assume, measure. Back every optimization with data.


Related Posts