June 20th, 2024

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

Read original articleLink Icon
20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly can achieve a 20x speedup for background removal directly in the browser, compared to traditional methods. This approach reduces server load, enhances scalability, and improves data security by processing tasks on the client side. The ONNX Runtime by Microsoft enables running ONNX models in the browser, with WebGPU support offering significant performance boosts. By converting models to fp16 and QUINT8 datatypes, download sizes can be reduced, although this may impact output quality. Evaluation shows that WebGPU outperforms CPU execution, providing near real-time performance for background removal. Leveraging modern technology like ONNX Runtime and WebGPU, IMG.LY aims to make design tools more accessible and efficient across various platforms. Future plans include expanding background removal to videos and exploring new creative editing experiences.

Related

Show HN: Eidos – Offline alternative to Notion

Show HN: Eidos – Offline alternative to Notion

The Eidos project on GitHub offers a personal data management framework as a Progressive Web App with AI features. Customizable with extensions and scripting, it leverages sqlite-wasm technology for chromium-based browsers.

Mip-Splatting: Alias-Free 3D Gaussian Splatting

Mip-Splatting: Alias-Free 3D Gaussian Splatting

The paper introduces Mip-Splatting, enhancing 3D Gaussian Splatting by addressing artifacts with a 3D smoothing filter and a 2D Mip filter, achieving alias-free renderings and improved image fidelity in 3D rendering applications.

Gren 0.4: New Foundations

Gren 0.4: New Foundations

Gren 0.4 updates its functional language with enhanced core packages, a new compiler, revamped FileSystem API, improved functions, and a community shift to Discord. These updates aim to boost usability and community engagement.

Homegrown Rendering with Rust

Homegrown Rendering with Rust

Embark Studios develops a creative platform for user-generated content, emphasizing gameplay over graphics. They leverage Rust for 3D rendering, introducing the experimental "kajiya" renderer for learning purposes. The team aims to simplify rendering for user-generated content, utilizing Vulkan API and Rust's versatility for GPU programming. They seek to enhance Rust's ecosystem for GPU programming.

Microsoft shelves its underwater data center

Microsoft shelves its underwater data center

Microsoft has ended its underwater data center experiment, noting improved server longevity underwater. Despite success, Microsoft shifts focus to other projects like AI supercomputers and nuclear ambitions, discontinuing further underwater endeavors.

Link Icon 13 comments
By @DaiPlusPlus - 5 months
Background Removal can be thought of as Foreground Segmentation, inverted. That is no trivial feat; my undergraduate thesis was on segmentation, but using only “mechanical” approaches, no NNs, etc), hence my appreciation!

But here’s something I don’t understand: (And someone please correct me if I’m wrong!) - now I do understand that NNs are to software what FPGAs are to hardware, and the ability to pick any node and mess with it (delete, clone, more connections, less connections, link weights, swap-out the activation functions, etc) means they’re perfect for evolutionary-algorithms that mutate, spawn, and cull these NNs until they solve some problem (e.g. playing Super Mario on a NES (props to Tom7) or in this case, photo background segmentation.

…now, assuming the analogy to FPGAs still holds, with NNs being an incredibly inefficient way to encode and execute steps in a data-processing pipeline (but very efficient at evolving that pipeline) - doesn’t it then mean that whatever process is encoded in the NN, it should both be possible to represent in some more efficient representation (I.e. computer program code, even if it’s highly parallelised) and that “compiling” it down is essential for performance? And if so, then why are models/systems like this being kept in NN form?

(I look forward to revisiting this post a decade from now and musing at my current misconceptions)

By @andrewstuart - 5 months
Worth noting that background removal is built in to Preview on Macos.
By @forgotusername6 - 5 months
"Therefore, the first run of the network will take ~300 ms and consecutive runs will be ~100 ms"

I only skimmed the article, but I don't think they mention the size of the image. 100ms is not that impressive when you consider that you need to be three times as fast for acceptable video frame rate.

By @pjmlp - 5 months
As long as one uses a Chrome distribution.

WebGPU is at least one year away of becoming usable for cross browser deployment.

By @tlarkworthy - 5 months
Onnx is cool, the other option is tensorflow js which I have found quite nice as a usable matrix lib for JS with shockingly good perf.would love to know how well they compare
By @wruza - 5 months
Interesting, there’s also node version in /packages.
By @jvdvegt - 5 months
MS teams does this already, right? (I assume they do, as it didn't work in Firefox until recently)

Or do they do it server side?

By @tommek4077 - 5 months
If I run it in a browser on my client, why going to a website in the first place?