From TimSort to pdqsort: Engineering Optimizations in Production Sorting

Paul included in Go

2018-06-21 1267 words 6 minutes

Textbook quicksort is just the starting point. Real-world sorting algorithms have undergone decades of engineering optimization. This post deep-dives into Python’s TimSort and Go/Rust’s pdqsort, revealing the gap between “industrial-grade” and “textbook-grade”.

1. Limitations of Textbook Sorting

1.1 Quicksort: Theory vs Reality

Textbook quicksort:

1
2
3
4
5
6
7
func quickSort(arr []int, low, high int) {
    if low < high {
        pivot := partition(arr, low, high)
        quickSort(arr, low, pivot-1)
        quickSort(arr, pivot+1, high)
    }
}

Theory: Average O(n log n), in-place, perfect.

Reality problems:

Worst case O(n²): Sorted/reversed arrays degrade
Recursion overhead: Depth can reach n, stack overflow risk
Cache unfriendly: Random access pattern
Unstable: Equal element order may change

1.2 Language Choices

Language	Default Sort	Characteristics
Python	TimSort	Stable, exploits existing order
Go 1.19+	pdqsort	Fast, unstable, avoids worst case
Rust	pdqsort	Same as Go
Java	TimSort (objects) / DualPivot (primitives)	Optimized per type
C++	IntroSort	Quicksort + heapsort hybrid

2. TimSort: Designed for Real Data

2.1 Core Insight

TimSort’s designer Tim Peters (Python core dev) observed:

“Real-world data is often partially sorted”

Logs sorted by time, database records by ID, append-only data — all have “run” characteristics.

2.2 Algorithm Structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Input: [1, 2, 3, 8, 7, 6, 5, 10, 11, 12]

Step 1: Identify runs
  Run 1: [1, 2, 3] (ascending)
  Run 2: [8, 7, 6, 5] → reverse to [5, 6, 7, 8]
  Run 3: [10, 11, 12] (ascending)

Step 2: Merge runs (merge sort)
  [1, 2, 3] + [5, 6, 7, 8] → [1, 2, 3, 5, 6, 7, 8]
  + [10, 11, 12] → [1, 2, 3, 5, 6, 7, 8, 10, 11, 12]

2.3 Key Optimizations

1. Minimum run length

If run is too short, use insertion sort to extend to minrun (typically 32-64).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# CPython source (simplified)
def binary_insertion_sort(arr, lo, hi, start):
    """Binary insertion sort in [lo, hi)"""
    for i in range(start, hi):
        pivot = arr[i]
        # Binary search for insert position
        left, right = lo, i
        while left < right:
            mid = (left + right) // 2
            if arr[mid] > pivot:
                right = mid
            else:
                left = mid + 1
        # Move elements
        arr[left+1:i+1] = arr[left:i]
        arr[left] = pivot

Why insertion sort for small arrays?

Small constant factor
Cache friendly (sequential access)
O(n) for nearly-sorted data

2. Galloping Mode

When merging two runs, if one run keeps “winning”, data distribution is uneven:

1
2
3
4
5
# Normal merge: compare one by one
[1, 2, 3, 4, 100] + [50, 51, 52, 53, 54]

# After detecting run1 consistently smaller, switch to galloping:
# Exponential search to find where 100 should insert in [50, 51, 52, 53, 54]

3. Run stack balancing

TimSort maintains a run stack, following “merge policy” to ensure balanced merging:

1
2
3
4
5
Rules (counting from stack top):
  A > B + C
  B > C

Violation triggers merge, guarantees stack height O(log n)

2.4 Performance Characteristics

Scenario	Time Complexity
Fully sorted	O(n)
Reversed	O(n) (reverse)
Random	O(n log n)
Partially sorted	Between O(n) and O(n log n)

Stability: Guarantees equal elements keep order, friendly for object sorting.

3. pdqsort: Modern Quicksort’s Peak

3.1 Design Goals

pdqsort (Pattern-Defeating Quicksort) goals:

Average performance close to quicksort
Worst case O(n log n)
Detect special patterns to accelerate
Unstable but extremely fast

3.2 Core Techniques

1. Adaptive pivot selection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Go source (sort package) simplified
func choosePivot(data []int, a, b int) int {
    l := b - a
    
    // Small array: take middle
    if l < 8 {
        return a + l/2
    }
    
    // Medium array: median of three
    if l < 50 {
        m := a + l/2
        return medianOfThree(data, a, m, b-1)
    }
    
    // Large array: ninther (median of 9)
    m := a + l/2
    return ninther(data, a, m, b-1)
}

2. Bad pivot detection

If pivot splits too unevenly (<1/8), might be “killer input”:

1
2
3
4
5
6
7
8
if partitionSize < length/8 {
    badPivotCount++
    if badPivotCount >= maxBadPivots {
        // Switch to heapsort, guarantee O(n log n)
        heapSort(data, a, b)
        return
    }
}

This is what “Pattern-Defeating” means: detect and defeat inputs trying to trigger worst case.

3. Equal element partition (Dutch National Flag)

1
2
3
4
5
6
Input: [5, 3, 5, 7, 5, 2, 5]
pivot = 5

Three-way partition:
[2, 3] [5, 5, 5, 5] [7]
 < 5       = 5       > 5

For arrays with many duplicates, skip the middle section entirely.

4. Small array optimization

1
2
3
4
5
// Less than 12 elements, use insertion sort
if b-a < 12 {
    insertionSort(data, a, b)
    return
}

3.3 Go 1.19 sort Package Source

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// src/sort/zsortinterface.go
func pdqsort(data Interface, a, b, limit int) {
    const maxInsertion = 12
    
    for {
        length := b - a
        
        // Small arrays use insertion sort
        if length < maxInsertion {
            insertionSort(data, a, b)
            return
        }
        
        // Too many bad pivots, switch to heapsort
        if limit == 0 {
            heapSort(data, a, b)
            return
        }
        
        // Choose pivot and partition
        pivot := choosePivot(data, a, b)
        mid := partition(data, a, b, pivot)
        
        // Detect bad partition
        if mid-a < length/8 || b-mid < length/8 {
            limit--
        }
        
        // Recurse (tail call optimization: recurse small, loop large)
        if mid-a < b-mid {
            pdqsort(data, a, mid, limit)
            a = mid + 1
        } else {
            pdqsort(data, mid+1, b, limit)
            b = mid
        }
    }
}

4. Performance Comparison

4.1 Test Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
func BenchmarkSorts(b *testing.B) {
    sizes := []int{100, 1000, 10000, 100000}
    patterns := []string{"random", "sorted", "reversed", "partial", "duplicates"}
    
    for _, size := range sizes {
        for _, pattern := range patterns {
            b.Run(fmt.Sprintf("%s-%d", pattern, size), func(b *testing.B) {
                data := generateData(pattern, size)
                b.ResetTimer()
                for i := 0; i < b.N; i++ {
                    arr := make([]int, len(data))
                    copy(arr, data)
                    sort.Ints(arr)
                }
            })
        }
    }
}

4.2 Typical Results

Data Pattern	100K elements	Winner
Random	6ms	pdqsort (quicksort genes)
Sorted	0.5ms	TimSort (direct detection)
Reversed	0.5ms	TimSort (reverse)
Partially sorted	2ms	TimSort
Many duplicates	2ms	pdqsort (3-way partition)

5. Engineering Insights

5.1 No Silver Bullet

1
2
3
4
5
// Need stable sort?
sort.SliceStable(data, less)  // TimSort-inspired stable sort

// Maximum performance?
sort.Slice(data, less)  // pdqsort

5.2 Custom Sort Gotchas

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Wrong: inconsistent comparison
sort.Slice(users, func(i, j int) bool {
    return users[i].Name <= users[j].Name  // <= instead of <
})
// May cause infinite loop!

// Correct
sort.Slice(users, func(i, j int) bool {
    return users[i].Name < users[j].Name
})

5.3 When to Write Your Own Sort?

Almost never. Standard library is already optimal.

Exceptions:

Ultra-large distributed sorting (MapReduce)
Hardware-specific optimization (GPU sorting)
External sorting (disk merge)

6. Summary

Algorithm	Design Philosophy	Use Case
TimSort	Exploit existing runs	Python/Java object sorting
pdqsort	Defend against worst case	Go/Rust high-performance
IntroSort	Quicksort + heapsort hybrid	C++ STL

Key takeaways:

Real data is often partially sorted — TimSort exploits this
Malicious input can attack quicksort — pdqsort detects and defends
Insertion sort for small arrays — every implementation agrees
Engineering optimization matters more than algorithm choice

Contents