Contents

Production-Grade String Parsing: From atoi to State Machine

LeetCode’s atoi is a classic interview problem, but production-grade implementations are far more complex. This post starts from the interview version and gradually analyzes Go and Rust standard library implementations to understand “production-grade” code design.

1. Interview Version atoi

1.1 LeetCode 8: String to Integer

Basic requirements:

  • Skip leading whitespace
  • Handle positive/negative signs
  • Convert digit characters
  • Handle overflow
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
func myAtoi(s string) int {
    i, n := 0, len(s)
    
    // Skip whitespace
    for i < n && s[i] == ' ' {
        i++
    }
    
    if i >= n {
        return 0
    }
    
    // Handle sign
    sign := 1
    if s[i] == '+' {
        i++
    } else if s[i] == '-' {
        sign = -1
        i++
    }
    
    // Convert digits
    result := 0
    for i < n && s[i] >= '0' && s[i] <= '9' {
        digit := int(s[i] - '0')
        
        // Overflow check
        if result > (math.MaxInt32-digit)/10 {
            if sign == 1 {
                return math.MaxInt32
            }
            return math.MinInt32
        }
        
        result = result*10 + digit
        i++
    }
    
    return sign * result
}

This passes LeetCode, but it’s far from production-grade.

1.2 What’s Missing?

Interview VersionProduction Requirements
Only handles int32Supports multiple types (int8, int64, uint…)
Only returns resultNeeds detailed error messages
Only base 10Supports multiple bases (2, 8, 16…)
Simple overflow handlingPrecise overflow detection
Fixed formatSupports 0x, 0o, 0b prefixes

2. Go Standard Library Analysis

2.1 strconv.ParseInt Signature

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func ParseInt(s string, base int, bitSize int) (i int64, err error)

// Parameters:
//   s: input string
//   base: radix (0 means auto-detect)
//   bitSize: target type bits (8, 16, 32, 64)

// Returns:
//   i: parsed result
//   err: error info (includes detailed reason)

2.2 Error Types

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
type NumError struct {
    Func string // Function name ("ParseInt")
    Num  string // Input string
    Err  error  // Specific error
}

var (
    ErrRange = errors.New("value out of range")
    ErrSyntax = errors.New("invalid syntax")
)

2.3 Source Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
// src/strconv/atoi.go (simplified)
func ParseInt(s string, base int, bitSize int) (i int64, err error) {
    // Empty string
    if s == "" {
        return 0, syntaxError("ParseInt", s)
    }
    
    // Handle sign
    s0 := s
    neg := false
    if s[0] == '+' {
        s = s[1:]
    } else if s[0] == '-' {
        neg = true
        s = s[1:]
    }
    
    // Handle base prefix
    if base == 0 {
        base = 10
        if s[0] == '0' {
            switch {
            case len(s) >= 2 && lower(s[1]) == 'x':
                base = 16
                s = s[2:]
            case len(s) >= 2 && lower(s[1]) == 'o':
                base = 8
                s = s[2:]
            case len(s) >= 2 && lower(s[1]) == 'b':
                base = 2
                s = s[2:]
            default:
                base = 8
                s = s[1:]
            }
        }
    }
    
    // Calculate overflow bounds
    var cutoff uint64
    switch bitSize {
    case 8:
        cutoff = uint64(1<<7 - 1)
    case 16:
        cutoff = uint64(1<<15 - 1)
    case 32:
        cutoff = uint64(1<<31 - 1)
    case 64:
        cutoff = uint64(1<<63 - 1)
    }
    if neg {
        cutoff++
    }
    
    // Character-by-character conversion
    var n uint64
    for _, c := range []byte(s) {
        var d byte
        switch {
        case '0' <= c && c <= '9':
            d = c - '0'
        case 'a' <= lower(c) && lower(c) <= 'z':
            d = lower(c) - 'a' + 10
        default:
            return 0, syntaxError("ParseInt", s0)
        }
        
        if d >= byte(base) {
            return 0, syntaxError("ParseInt", s0)
        }
        
        // Overflow check (before multiplication)
        if n >= cutoff {
            return 0, rangeError("ParseInt", s0)
        }
        
        n *= uint64(base)
        n1 := n + uint64(d)
        
        // Check addition overflow
        if n1 < n || n1 > cutoff {
            return 0, rangeError("ParseInt", s0)
        }
        n = n1
    }
    
    if neg {
        return -int64(n), nil
    }
    return int64(n), nil
}

2.4 Design Highlights

  1. Overflow detection before operation: Avoids detecting after already overflowing
  2. Detailed error info: Includes original input for debugging
  3. Unified signed/unsigned handling: Calculate with uint64, convert at end
  4. Auto base detection: 0x, 0o, 0b prefixes

3. State Machine Implementation

3.1 Why State Machine?

Complex parsing needs are hard to handle cleanly with if-else:

1
2
3
4
5
6
7
8
9
Valid input examples:
"  -42"      → -42
"0x1A"       → 26 (auto-detect base 16)
"   +0o17"   → 15 (auto-detect base 8)

Invalid input examples:
"--42"       → syntax error
"0x"         → syntax error (no digits)
"123abc"     → ? (Go errors, some languages accept 123)

3.2 State Machine Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
         ┌─────────────────────────────────────────────────────┐
         │                                                      │
    ┌────▼────┐  space   ┌─────────┐  +/-   ┌─────────┐         │
    │  START  │────────▶│ SPACES  │───────▶│  SIGN   │         │
    └─────────┘         └────┬────┘        └────┬────┘         │
         │                   │                   │              │
         │ 0                 │ digit             │ digit        │
         ▼                   │                   │              │
    ┌─────────┐              │                   │              │
    │ PREFIX  │◄─────────────┴───────────────────┘              │
    └────┬────┘                                                 │
         │ x/o/b/digit                                          │
         ▼                                                      │
    ┌─────────┐  digit                                          │
    │ DIGITS  │────────┐                                        │
    └────┬────┘        │                                        │
         │             │                                        │
         │ non-digit   │                                        │
         ▼             ▼                                        │
    ┌─────────┐   ┌─────────┐                                   │
    │  ERROR  │   │   END   │───────────────────────────────────┘
    └─────────┘   └─────────┘

3.3 Go Implementation

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
type State int

const (
    StateStart State = iota
    StateSign
    StatePrefix
    StateDigits
    StateEnd
    StateError
)

type Parser struct {
    state   State
    sign    int
    base    int
    result  uint64
    bitSize int
    cutoff  uint64
}

func NewParser(bitSize int) *Parser {
    cutoff := uint64(1<<(bitSize-1) - 1)
    return &Parser{
        state:   StateStart,
        sign:    1,
        base:    10,
        bitSize: bitSize,
        cutoff:  cutoff,
    }
}

func (p *Parser) Feed(c byte) error {
    switch p.state {
    case StateStart:
        return p.handleStart(c)
    case StateSign:
        return p.handleSign(c)
    case StatePrefix:
        return p.handlePrefix(c)
    case StateDigits:
        return p.handleDigits(c)
    default:
        return errors.New("invalid state")
    }
}

func (p *Parser) handleStart(c byte) error {
    switch {
    case c == ' ' || c == '\t':
        // Stay in Start state
        return nil
    case c == '+':
        p.state = StateSign
        return nil
    case c == '-':
        p.sign = -1
        p.cutoff++ // Negative can be one more
        p.state = StateSign
        return nil
    case c == '0':
        p.state = StatePrefix
        return nil
    case c >= '1' && c <= '9':
        p.state = StateDigits
        return p.addDigit(c - '0')
    default:
        p.state = StateError
        return errors.New("invalid character")
    }
}

func (p *Parser) handlePrefix(c byte) error {
    switch lower(c) {
    case 'x':
        p.base = 16
        p.state = StateDigits
        return nil
    case 'o':
        p.base = 8
        p.state = StateDigits
        return nil
    case 'b':
        p.base = 2
        p.state = StateDigits
        return nil
    default:
        p.base = 8
        p.state = StateDigits
        return p.addDigit(c - '0')
    }
}

func (p *Parser) addDigit(d byte) error {
    if p.result > p.cutoff/uint64(p.base) {
        return errors.New("overflow")
    }
    p.result = p.result*uint64(p.base) + uint64(d)
    if p.result > p.cutoff {
        return errors.New("overflow")
    }
    return nil
}

func (p *Parser) Result() (int64, error) {
    if p.state == StateError {
        return 0, errors.New("parse error")
    }
    if p.sign < 0 {
        return -int64(p.result), nil
    }
    return int64(p.result), nil
}

4. Rust Standard Library Comparison

4.1 Rust’s from_str

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Rust uses trait implementation
impl FromStr for i32 {
    type Err = ParseIntError;
    
    fn from_str(s: &str) -> Result<i32, ParseIntError> {
        i32::from_str_radix(s, 10)
    }
}

// Usage
let n: i32 = "42".parse()?;
let n = i32::from_str_radix("0x2A", 16)?;

4.2 Rust’s Error Handling

1
2
3
4
5
6
7
pub enum IntErrorKind {
    Empty,           // Empty string
    InvalidDigit,    // Invalid character
    PosOverflow,     // Positive overflow
    NegOverflow,     // Negative overflow
    Zero,            // Unexpected zero (some contexts)
}

4.3 Comparison

FeatureGoRust
Return type(int64, error)Result<T, ParseIntError>
BaseParameterMethod name distinction
Error granularityError interfaceEnum type
Sign handlingInternalType system distinction

5. Practical Applications

5.1 Config File Parsing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
type Config struct {
    Port     int
    Timeout  time.Duration
    MaxSize  int64
}

func parseSize(s string) (int64, error) {
    s = strings.TrimSpace(s)
    if len(s) == 0 {
        return 0, nil
    }
    
    // Support K, M, G suffixes
    unit := int64(1)
    switch s[len(s)-1] {
    case 'K', 'k':
        unit = 1024
        s = s[:len(s)-1]
    case 'M', 'm':
        unit = 1024 * 1024
        s = s[:len(s)-1]
    case 'G', 'g':
        unit = 1024 * 1024 * 1024
        s = s[:len(s)-1]
    }
    
    n, err := strconv.ParseInt(s, 0, 64)
    if err != nil {
        return 0, err
    }
    
    // Check multiplication overflow
    if n > 0 && unit > math.MaxInt64/n {
        return 0, errors.New("size overflow")
    }
    
    return n * unit, nil
}

// Usage
parseSize("512M")  // 536870912
parseSize("0x100") // 256

5.2 Command Line Arguments

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func parsePort(s string) (int, error) {
    port, err := strconv.ParseInt(s, 10, 16)  // 16 bits enough for port
    if err != nil {
        return 0, fmt.Errorf("invalid port: %w", err)
    }
    if port < 1 || port > 65535 {
        return 0, fmt.Errorf("port must be 1-65535, got %d", port)
    }
    return int(port), nil
}

6. Summary

LevelImplementation
Interviewif-else, basic cases only
ProductionState machine, complete error handling
Standard libraryHighly optimized, multi-type multi-base

Key takeaways:

  1. Overflow detection before operation, not after
  2. State machines make complex parsing logic clearer
  3. Error messages should be detailed, include original input
  4. Boundary conditions are what separate interview code from production code