10B Integers Walk into an Array
Donald Raab's experiment with Pharo Smalltalk demonstrated storing 10 billion SmallInteger instances, showcasing 64-bit addressing advantages over Java's limitations, while emphasizing future programming capabilities for large datasets.
Read original articleThe article discusses an experiment conducted by Donald Raab using Pharo Smalltalk to store 10 billion SmallInteger instances in an array, highlighting the capabilities of 64-bit addressing in modern programming environments. Raab reflects on his past experiences with 32-bit Smalltalk and the limitations of Java's array size, which is capped at 2³¹-1 elements. He notes that while Java has third-party libraries to handle larger arrays, they can be complex to manage. The experiment was made possible by his MacBook Pro M2 Max, which has 96GB of RAM. Raab successfully created arrays of up to 10 billion elements, discovering that Pharo optimizes SmallInteger storage by inlining values rather than storing references, allowing for efficient memory usage. He contrasts this with Java, where exceeding the maximum long value results in overflow. The article also includes code snippets demonstrating the experiments in both Pharo and Java, emphasizing the differences in handling large data sets between the two languages. Raab concludes that while most developers may not require such large arrays, the advancements in hardware and programming languages are paving the way for handling larger datasets in the future.
- Pharo Smalltalk can store over 2 billion elements in an array due to 64-bit addressing.
- Java arrays are limited to 2³¹-1 elements, necessitating third-party libraries for larger datasets.
- The experiment utilized a MacBook Pro M2 Max with 96GB of RAM.
- Pharo optimizes memory by inlining SmallInteger values, unlike Java's reference storage.
- The article highlights the potential for future programming environments to handle larger data sets efficiently.
Related
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
Overview of cross-architecture portability problems
Michał Górny discusses cross-architecture portability challenges between 32-bit and 64-bit systems, highlighting issues with memory allocation, file size limitations, and the Y2K38 problem affecting multiple programming languages.
Eliminating Intermediate Array Allocations
The article highlights Ruby's memory allocation optimization, particularly in tokenizers and array operations, emphasizing that simple literals do not allocate memory and that idiomatic Ruby can be performant.
The costs of the i386 to x86-64 upgrade
The article examines the shift from 32-bit to 64-bit architecture, discussing its impact on code density, memory usage, performance, and the evolution of software demands and hardware capabilities.
10B Integers Walk into an Array
An experiment using Pharo Smalltalk demonstrated storing 10 billion SmallInteger instances, highlighting 64-bit addressing advantages over Java's limitations, particularly in memory optimization and handling large data sets.
The funny thing is that a 64 bit node ID is of equal size as the location the node element contains (32 bit longitude + 32 bit latitude) [1].
[0] https://taginfo.openstreetmap.org/reports/database_statistic...
One of the issues we discussed is trying to do this with Java arrays or collections. As he observes, Java arrays have a maximum size of 2^31 - 1 elements, because they're indexed by a signed 32-bit int. In practice, the Hotspot JVM has a maximum array size of 2^31 - 2 elements. (I'm not entirely sure why this is. I'd guess that it's because the various GC implementations have a maximum memory block size of 2^31 words, and two words are consumed by the object header.)
The author considered trying to store 10B elements in a Java Collection, but there are some roadblocks that make this either difficult or insurmountable.
A Java ArrayList stores all its elements in a single array; thus it has the same limitation as Java's arrays. This is a pretty hard limit.
A Java LinkedList can actually store more than 2^31 elements. It's a non-intrusive linked list, so each element has a separate list node, which is another Java object. Each list node object consumes 40 bytes (including the object header and fields) which means that storing two billion elements in a LinkedList has 80GB of overhead, not including the actual data elements. (And ten billion elements will require 400GB of memory for overhead.) This will probably work if you have enough memory, but this amount of overhead seems prohibitive.
The author instead used the `fastutil` library to work around these limitations.
Pretty impressive that Pharo Smalltalk is able to do this.
The best case I've seen for this is memory mapping (mmap).
There's long been a tradeoff between simple-data & complex-algorithm and complex-data & simple-algorithm. And the complex-data approach has all but won in the typical OO or web-dev shop.
In short, mmap gives you the power to treat files as arrays. You can just open a 200GB file and run qsort on it, or do a binary search or whatever.
The operating system takes care of when to move things in and out of ram, and it's pretty good at it. Not only that, but it stays in the cache between invocations of your program. So you can crash out, and resume processing as fast as you can hit ctrl-c, up, enter.
I can't reproduce it here, but I found it really funny that "short" was monospaced, like it was the type.
For example, 64-bit UNIX epoch will work for 292 billion years. That's longer than the age of the universe and obviously enough. Meanwhile, signed 64-bit nanosecond (!) timestamps only range from 1678 AD to 2262 AD. That seems quite restrictive. So ultimately somewhere around 96-bits of time seems to be "enough" for any practical application.
For audio we've probably crossed the point already. Uncompressed audio is in the ballpark of a few Mbit/s. Humans will probably never need Gbit audio. What's the limit for video? Seems to be undetermined yet.
All of this will limit the max compute resources humanity will need for certain applications. Maybe we will see the widespread use of 128-bit computers during our lifetimes. Mostly for virtual addressing fun. But will 256-bit ever be needed?
Some even come in tower format, so we can call them high-end technical workstations if we really want to.
Related
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
Overview of cross-architecture portability problems
Michał Górny discusses cross-architecture portability challenges between 32-bit and 64-bit systems, highlighting issues with memory allocation, file size limitations, and the Y2K38 problem affecting multiple programming languages.
Eliminating Intermediate Array Allocations
The article highlights Ruby's memory allocation optimization, particularly in tokenizers and array operations, emphasizing that simple literals do not allocate memory and that idiomatic Ruby can be performant.
The costs of the i386 to x86-64 upgrade
The article examines the shift from 32-bit to 64-bit architecture, discussing its impact on code density, memory usage, performance, and the evolution of software demands and hardware capabilities.
10B Integers Walk into an Array
An experiment using Pharo Smalltalk demonstrated storing 10 billion SmallInteger instances, highlighting 64-bit addressing advantages over Java's limitations, particularly in memory optimization and handling large data sets.