September 22nd, 2024

Algorithms for the 21st Century (2006)

Stephen C. Johnson's paper critiques traditional algorithms, emphasizing the need for reevaluation to optimize performance on modern hardware, advocating for sequential memory access and collaboration between software and hardware developers.

Read original article

The paper "Algorithms for the 21st Century" by Stephen C. Johnson critiques the traditional algorithms taught in computer science, arguing that they do not align with the performance characteristics of modern computing machines. It highlights that while algorithms are often analyzed based on a uniform memory model, contemporary systems reward sequential memory access rather than locality of reference. This discrepancy is particularly significant for large problems that necessitate 64-bit architectures. The author emphasizes the importance of understanding how CPU cycle counts can provide accurate performance measurements, although they can be influenced by various factors such as system load and hardware configurations. Through experiments, the paper demonstrates that the order of operations in summing large arrays can significantly impact performance, with sequential access proving to be more efficient than accessing elements in a non-linear fashion. The findings suggest that traditional data structures and algorithms may not be optimal for modern hardware, especially as data sizes grow beyond cache capacities. The author calls for a reevaluation of algorithm design principles, advocating for a trial-and-error approach to optimize performance for large data applications. The paper concludes by suggesting collaboration between software and hardware developers to create more efficient algorithms and data structures tailored to contemporary computing environments.

- Traditional algorithms may not perform well on modern hardware.

- Sequential memory access is favored over locality of reference in current systems.

- Performance can vary significantly based on the order of operations in algorithms.

- A trial-and-error approach may be necessary for optimizing algorithms for large data.

- Collaboration between software and hardware developers is essential for future algorithm design.

C++ Design Patterns for Low-Latency Applications

The article delves into C++ design patterns for low-latency applications, emphasizing optimizations for high-frequency trading. Techniques include cache prewarming, constexpr usage, loop unrolling, and hotpath/coldpath separation. It also covers comparisons, datatypes, lock-free programming, and memory access optimizations. Importance of code optimization is underscored.

What Is Analog Computing?

Analog computing, utilizing continuous systems, contrasts with digital computing's binary approach. Researchers are revisiting it for energy efficiency, especially in AI, amid rising computational demands and energy consumption.

Do low-level optimizations matter? Faster quicksort with cmov (2020)

The article emphasizes the importance of low-level optimizations in sorting algorithms, highlighting how modern CPU features and minimizing conditional branches can enhance performance, particularly with the new `swap_if` primitive.

When Bloom filters don't bloom (2020)

The author explored Bloom filters for IP spoofing but faced performance issues due to memory access times. A simpler hash table approach ultimately provided better performance for large datasets.

The Byte Order Fallacy

The article argues that native byte order is irrelevant for programmers, advocating for code that handles data streams independently, avoiding complexity and bugs from practices like byte swapping.

5 comments

By @sameoldtune - 4 months

This paper is very enlightening, but I don’t think that an algorithms class should address these sorts of problems. When I went to school (15 years ago) We had a separate machine architecture class series that went over these sorts of practical concerns. We had labs where we would time memory accesses and test aliased variables and manually decompile simple functions.

In my humble opinion as a volunteer educator, algorithms are already very complicated and students being introduced to them don’t need to concern themselves with this stuff at the same time.

By @jpollock - 4 months

Nifty paper. I kind-of wish I was in a problem space that had that sort of issue!

I just spent a week shaving 98% off of the latency in a single RPC call by going from O(N^2) to O(1). Data locality is not a problem I have. :)

By @kmoser - 4 months

> We can also examine what happens when we write data repeatedly.

I wonder whether an optimizing compiler can detect and optimize the sequential writes of the same values, possibly by replacing them with something like a blitter. It would be interesting to see the difference in speed if the data to be written was random, rather than the same value (1.0) every time.

By @abirch - 4 months

This is a good paper. I would label it "Caveats for Big O" or a map is not the territory.