July 8th, 2024

Show HN: Simulating 20M Particles in JavaScript

This article discusses optimizing JavaScript performance for simulating 1,000,000 particles in a browser. It covers data access optimization, multi-threading with SharedArrayBuffers and web workers, and memory management strategies.

Read original articleLink Icon
Show HN: Simulating 20M Particles in JavaScript

This article explores the speed of JavaScript by simulating 1,000,000 particles using only the CPU in a browser environment. The author discusses the importance of optimizing data access for performance, highlighting the use of TypedArrays to ensure tightly packed contiguous arrays of data. The implementation involves utilizing SharedArrayBuffers and web workers for multi-threading without the complexity of locks. The simulation involves updating particle positions and rendering them to the screen using ImageData on an HTML canvas. The code structure includes decoupling simulation and rendering, distributing work across CPU cores, and managing communication between the main thread and worker threads. The author also shares insights on avoiding memory access conflicts in multi-threaded environments to maintain determinism. The article concludes with a demonstration of the simulation running smoothly with minimal garbage collection, leaving room for further enhancements like interactivity. The approach taken emphasizes performance optimization and efficient memory management in JavaScript for handling large-scale computations.

Related

Why Google Sheets ported its calculation worker from JavaScript to WasmGC

Why Google Sheets ported its calculation worker from JavaScript to WasmGC

Google Sheets transitioned its calculation worker to WasmGC from JavaScript for improved performance. Collaboration between Sheets and Chrome teams led to optimizations, overcoming challenges for near-native speed on the web.

Optimizing JavaScript for Fun and for Profit

Optimizing JavaScript for Fun and for Profit

Optimizing JavaScript code for performance involves benchmarking, avoiding unnecessary work, string comparisons, and diverse object shapes. JavaScript engines optimize based on object shapes, impacting array/object methods and indirection. Creating objects with the same shape improves optimization, cautioning against slower functional programming methods. Costs of indirection like proxy objects and function calls affect performance. Code examples and benchmarks demonstrate optimization variances.

Scan HTML faster with SIMD instructions: .NET/C# Edition

Scan HTML faster with SIMD instructions: .NET/C# Edition

WebKit and Chromium enhance HTML content scanning with fast SIMD routines, boosting performance significantly. .NET8 now supports speedy SIMD instructions for C#, achieving impressive speeds comparable to C/C++ implementations.

Show HN → Parallel DOM: Upgrade your DOM to be multithreaded

Show HN → Parallel DOM: Upgrade your DOM to be multithreaded

Parallel DOM accelerates web apps by parallelizing heavy DOM tasks. It integrates easily, runs React components concurrently, and ensures security through sandboxed iframes. Users can self-host or deploy with Vercel.

The Cost of JavaScript

The Cost of JavaScript

JavaScript significantly affects website performance due to download, execution, and parsing costs. Optimizing with strategies like code-splitting, minification, and caching is crucial for faster loading and interactivity, especially on mobile devices. Various techniques enhance JavaScript delivery and page responsiveness.

Link Icon 25 comments
By @andai - 6 months
Nice! I'd suggest embedding the simulation in the blog. I had to scroll up and down for a while before finding a link to the actual simulation.

(You might want to pick a value that runs reasonably well on old phones, or have it adjust based on frame rate. Alternatively just put a some links at the top of the article.)

See https://ciechanow.ski/ (very popular on this website) for a world-class example of just how cool it is to embed simulations right in the article.

(Obligatory: back in my day, every website used to embed cool interactive stuff!)

--

Also, I think you can run a particle sim on GPU without WebGPU.

e.g. https://news.ycombinator.com/item?id=19963640

By @jekude - 6 months
Demo on mobile [0], pretty incredible to play with.

[0] https://dgerrells.com/sabby

By @franciscop - 6 months
Random question (genuine, I do not know if it's possible):

> I decided to have each particle be represented by 4 numbers an x, y, dx, and dy. These will each be 32-bit floating point numbers.

Would it be possible to encode this data into a single JS number (53-bit number, given that MAX_SAFE_INTEGER is 2^53 - 1 = 9,007,199,254,740,991). Or -3.4e38 to 3.4e38, which is the range of the Float32Array used in the blog.

For example, I understand for the screen position you might have a 1000x1000 canvas, which can be represented with 0-1,000,000 numbers. Even if we add 10 sub-pixel divisions, that's still 100,000,000, which still fits very comfortably within JS.

Similar for speed (dx, dy), I see you are doing "(Math.random()*2-1)*10" for calculating the speed, which should go from -10,+10 with arbitrary decimal accuracy, but I wonder if limiting it to 1 decimal would be enough for the simulation, which would be [-10.0, +10.0] and can also be converted to the -100,+100 range in integers. Which needs 10,000 numbers to represent all of the possible values.

If you put both of those together, that gives 10,000 * 100,000,000 = 1,000,000,000,000 (1T) numbers needed to represent the particles, which still fits within JS' MAX_SAFE_INTEGER. So it seems you might be able to fit all of the data for a single particle within a single MAX_SAFE_INTEGER or a single Float32Array element? Then you don't need the stride and can be a lot more sure about data consistency.

It might be that the encoding/decoding of the data into a single number is slower than the savings in memory and so it's totally not worth it though, which I don't know.

By @codelikeawolf - 6 months
This is really awesome!

I did have a question about this:

> Javascript does support an Atomics API but it uses promises which are gross. Eww sick.

With the exception of waitAsync[1], the Atomics APIs don't appear to use promises. I've used Atomics before and never needed to mess with any async/promise code. Is it using promises behind the scenes or is there something else I'm missing?

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Edit: formatting

By @morphle - 6 months
By @dado3212 - 6 months
The videos look awesome but the "try it out here" codesandbox links don't work for me on MacOS Chrome desktop. I get 'Uncaught ReferenceError: SharedArrayBuffer is not defined' and some CORS errors: 'ERR_BLOCKED_BY_RESPONSE.NotSameOriginAfterDefaultedToSameOriginByCoep'.
By @edweis - 6 months
Marvelous. I spent an hour to understand the code and play with it. Here is a live implementation: https://particules.kapochamo.com/index.html
By @mjgoeke - 6 months
You might check if chrome://tracing helps give more insights: I came across it here

https://youtu.be/easvMCCBFkQ?t=114

By @tired_and_awake - 6 months
Seriously impressive engineering OP, thanks for the awesome writeup too. Looks like you've got a ton of fans now, well earned!
By @hereforcomments - 6 months
Oh, man, can't wait to send it to the UI team who write dead slow React apps. JS is blazing fast. Especially if written well.
By @hopfog - 6 months
Great article and very relevant for me since I'm building a game in JavaScript based on "falling sand" physics, which is all about simulating massive amount of particles (think Noita meets Factorio - feel free to wishlist if you think it sounds interesting).

My custom engine is built on a very similar solution using SharedArrayBuffers but there are still many things in this article that I'm eager to try, so thanks!

By @int0x29 - 6 months
Might want a strobe warning. At least for Firefox and Chromium in Linux on a desktop it strobes heavily in the starting state.
By @purple-leafy - 6 months
Such a clever fellow.

How does one get this good with understanding hardware level details like L1 caches and the software implications?

I graduated as an Electrical Engineer, moved into Software for career. Feel like I’m missing some skills.

Specifically how can I better understand and use:

- the chrome Profiler? It’s scary to me currently. - Graphics programming - Optimisiations?

By @Seb-C - 6 months
Nice article.

I have done a somewhat similar experiment a while ago and achieved to fit quite a lot of particles with a basic physics simulation.

https://github.com/Seb-C/gravity

By @pdsouza - 6 months
Love this. Enjoyed riding your train of thought from challenge conception through each performance pass to the final form. Surprisingly fun to play around with this sim too. Looking forward to more posts!
By @llmblockchain - 6 months
Is the code available somewhere? I'd like to see the full code and run locally. It looks like the code sandbox isn't working anymore.
By @thomasfromcdnjs - 6 months
Inspiring tutorial!

Does anyone know why/how it maintains state if you tab out? Does Chrome eventually try to clean up the cache or is it locked in?

By @iEchoic - 6 months
Very cool, thank you for sharing.

Has anyone done similar experimentation and/or benchmarking on using webgpu for neural nets in JS?

By @itvision - 6 months
I've saved it to Web Archive just in case, sadly it doesn't work that way.
By @lbj - 6 months
Anyone else having trouble with that web vscode he's using?
By @a-dub - 6 months
so when do we get WebBLAS and WebFORTRAN?

kinda joking, kinda not.

By @kragen - 6 months
super cool! i'm thinking webgpu might be usable for a speedup, not sure if webgl would be
By @dangoodmanUT - 6 months
This is great work
By @randall - 6 months
super helpful!!! thanks for this!!