Contents

From TimSort to pdqsort: Engineering Optimizations in Production Sorting

Textbook quicksort is just the starting point. Real-world sorting algorithms have undergone decades of engineering optimization. This post deep-dives into Python’s TimSort and Go/Rust’s pdqsort, revealing the gap between “industrial-grade” and “textbook-grade”.

1. Limitations of Textbook Sorting

1.1 Quicksort: Theory vs Reality

Textbook quicksort:

1
2
3
4
5
6
7
func quickSort(arr []int, low, high int) {
    if low < high {
        pivot := partition(arr, low, high)
        quickSort(arr, low, pivot-1)
        quickSort(arr, pivot+1, high)
    }
}

Theory: Average O(n log n), in-place, perfect.

Reality problems:

  1. Worst case O(n²): Sorted/reversed arrays degrade
  2. Recursion overhead: Depth can reach n, stack overflow risk
  3. Cache unfriendly: Random access pattern
  4. Unstable: Equal element order may change

1.2 Language Choices

LanguageDefault SortCharacteristics
PythonTimSortStable, exploits existing order
Go 1.19+pdqsortFast, unstable, avoids worst case
RustpdqsortSame as Go
JavaTimSort (objects) / DualPivot (primitives)Optimized per type
C++IntroSortQuicksort + heapsort hybrid

2. TimSort: Designed for Real Data

2.1 Core Insight

TimSort’s designer Tim Peters (Python core dev) observed:

“Real-world data is often partially sorted”

Logs sorted by time, database records by ID, append-only data — all have “run” characteristics.

2.2 Algorithm Structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Input: [1, 2, 3, 8, 7, 6, 5, 10, 11, 12]

Step 1: Identify runs
  Run 1: [1, 2, 3] (ascending)
  Run 2: [8, 7, 6, 5] → reverse to [5, 6, 7, 8]
  Run 3: [10, 11, 12] (ascending)

Step 2: Merge runs (merge sort)
  [1, 2, 3] + [5, 6, 7, 8] → [1, 2, 3, 5, 6, 7, 8]
  + [10, 11, 12] → [1, 2, 3, 5, 6, 7, 8, 10, 11, 12]

2.3 Key Optimizations

1. Minimum run length

If run is too short, use insertion sort to extend to minrun (typically 32-64).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# CPython source (simplified)
def binary_insertion_sort(arr, lo, hi, start):
    """Binary insertion sort in [lo, hi)"""
    for i in range(start, hi):
        pivot = arr[i]
        # Binary search for insert position
        left, right = lo, i
        while left < right:
            mid = (left + right) // 2
            if arr[mid] > pivot:
                right = mid
            else:
                left = mid + 1
        # Move elements
        arr[left+1:i+1] = arr[left:i]
        arr[left] = pivot

Why insertion sort for small arrays?

  • Small constant factor
  • Cache friendly (sequential access)
  • O(n) for nearly-sorted data

2. Galloping Mode

When merging two runs, if one run keeps “winning”, data distribution is uneven:

1
2
3
4
5
# Normal merge: compare one by one
[1, 2, 3, 4, 100] + [50, 51, 52, 53, 54]

# After detecting run1 consistently smaller, switch to galloping:
# Exponential search to find where 100 should insert in [50, 51, 52, 53, 54]

3. Run stack balancing

TimSort maintains a run stack, following “merge policy” to ensure balanced merging:

1
2
3
4
5
Rules (counting from stack top):
  A > B + C
  B > C

Violation triggers merge, guarantees stack height O(log n)

2.4 Performance Characteristics

ScenarioTime Complexity
Fully sortedO(n)
ReversedO(n) (reverse)
RandomO(n log n)
Partially sortedBetween O(n) and O(n log n)

Stability: Guarantees equal elements keep order, friendly for object sorting.

3. pdqsort: Modern Quicksort’s Peak

3.1 Design Goals

pdqsort (Pattern-Defeating Quicksort) goals:

  • Average performance close to quicksort
  • Worst case O(n log n)
  • Detect special patterns to accelerate
  • Unstable but extremely fast

3.2 Core Techniques

1. Adaptive pivot selection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Go source (sort package) simplified
func choosePivot(data []int, a, b int) int {
    l := b - a
    
    // Small array: take middle
    if l < 8 {
        return a + l/2
    }
    
    // Medium array: median of three
    if l < 50 {
        m := a + l/2
        return medianOfThree(data, a, m, b-1)
    }
    
    // Large array: ninther (median of 9)
    m := a + l/2
    return ninther(data, a, m, b-1)
}

2. Bad pivot detection

If pivot splits too unevenly (<1/8), might be “killer input”:

1
2
3
4
5
6
7
8
if partitionSize < length/8 {
    badPivotCount++
    if badPivotCount >= maxBadPivots {
        // Switch to heapsort, guarantee O(n log n)
        heapSort(data, a, b)
        return
    }
}

This is what “Pattern-Defeating” means: detect and defeat inputs trying to trigger worst case.

3. Equal element partition (Dutch National Flag)

1
2
3
4
5
6
Input: [5, 3, 5, 7, 5, 2, 5]
pivot = 5

Three-way partition:
[2, 3] [5, 5, 5, 5] [7]
 < 5       = 5       > 5

For arrays with many duplicates, skip the middle section entirely.

4. Small array optimization

1
2
3
4
5
// Less than 12 elements, use insertion sort
if b-a < 12 {
    insertionSort(data, a, b)
    return
}

3.3 Go 1.19 sort Package Source

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// src/sort/zsortinterface.go
func pdqsort(data Interface, a, b, limit int) {
    const maxInsertion = 12
    
    for {
        length := b - a
        
        // Small arrays use insertion sort
        if length < maxInsertion {
            insertionSort(data, a, b)
            return
        }
        
        // Too many bad pivots, switch to heapsort
        if limit == 0 {
            heapSort(data, a, b)
            return
        }
        
        // Choose pivot and partition
        pivot := choosePivot(data, a, b)
        mid := partition(data, a, b, pivot)
        
        // Detect bad partition
        if mid-a < length/8 || b-mid < length/8 {
            limit--
        }
        
        // Recurse (tail call optimization: recurse small, loop large)
        if mid-a < b-mid {
            pdqsort(data, a, mid, limit)
            a = mid + 1
        } else {
            pdqsort(data, mid+1, b, limit)
            b = mid
        }
    }
}

4. Performance Comparison

4.1 Test Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
func BenchmarkSorts(b *testing.B) {
    sizes := []int{100, 1000, 10000, 100000}
    patterns := []string{"random", "sorted", "reversed", "partial", "duplicates"}
    
    for _, size := range sizes {
        for _, pattern := range patterns {
            b.Run(fmt.Sprintf("%s-%d", pattern, size), func(b *testing.B) {
                data := generateData(pattern, size)
                b.ResetTimer()
                for i := 0; i < b.N; i++ {
                    arr := make([]int, len(data))
                    copy(arr, data)
                    sort.Ints(arr)
                }
            })
        }
    }
}

4.2 Typical Results

Data Pattern100K elementsWinner
Random6mspdqsort (quicksort genes)
Sorted0.5msTimSort (direct detection)
Reversed0.5msTimSort (reverse)
Partially sorted2msTimSort
Many duplicates2mspdqsort (3-way partition)

5. Engineering Insights

5.1 No Silver Bullet

1
2
3
4
5
// Need stable sort?
sort.SliceStable(data, less)  // TimSort-inspired stable sort

// Maximum performance?
sort.Slice(data, less)  // pdqsort

5.2 Custom Sort Gotchas

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Wrong: inconsistent comparison
sort.Slice(users, func(i, j int) bool {
    return users[i].Name <= users[j].Name  // <= instead of <
})
// May cause infinite loop!

// Correct
sort.Slice(users, func(i, j int) bool {
    return users[i].Name < users[j].Name
})

5.3 When to Write Your Own Sort?

Almost never. Standard library is already optimal.

Exceptions:

  • Ultra-large distributed sorting (MapReduce)
  • Hardware-specific optimization (GPU sorting)
  • External sorting (disk merge)

6. Summary

AlgorithmDesign PhilosophyUse Case
TimSortExploit existing runsPython/Java object sorting
pdqsortDefend against worst caseGo/Rust high-performance
IntroSortQuicksort + heapsort hybridC++ STL

Key takeaways:

  1. Real data is often partially sorted — TimSort exploits this
  2. Malicious input can attack quicksort — pdqsort detects and defends
  3. Insertion sort for small arrays — every implementation agrees
  4. Engineering optimization matters more than algorithm choice