June 21st, 2024

What Happens When You Put a Database in the Browser?

WebAssembly (Wasm) enhances browser capabilities, enabling high-performance apps like DuckDB for ad-hoc queries and Python environments. DuckDB Wasm boosts performance in interfaces like lakeFS, Evidence, and Count. MotherDuck enables local querying, emphasizing efficient data processing.

Read original articleLink Icon
What Happens When You Put a Database in the Browser?

WebAssembly (Wasm) has revolutionized browser capabilities, enabling high-performance applications directly within browsers. DuckDB, a C++ embedded database, leverages Wasm to operate in browsers, offering various use cases like ad-hoc queries, dynamic filtering, and educational tools. Wasm's potential is showcased by projects like pyodide, bringing Python environments to browsers. DuckDB Wasm is integrated into interfaces for enhanced performance, as seen in lakeFS and companies like Evidence and Count. A Firefox extension demonstrates DuckDB Wasm querying Parquet files in GCP Cloud Storage. MotherDuck utilizes DuckDB Wasm for responsive local querying, eliminating cloud communication. The Wasm SDK empowers developers to create data-driven applications efficiently. This blog highlights the transformative impact of Wasm on web applications and the opportunities DuckDB Wasm presents for faster analytics applications. Explore MotherDuck for free, experiment with the Wasm SDK, and embrace the future of efficient data processing.

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Improving texture atlas allocation in WebRender with the guillotiere crate reduces texture memory usage. The guillotine algorithm was replaced due to fragmentation issues, leading to a more efficient allocator. Visualizing the atlas in SVG aids debugging. Rust's simplicity and Cargo fuzz testing are praised for code development and robustness. Enhancements in draw call batching and texture upload aim to boost performance on low-end Intel GPUs by optimizing texture atlases.

Show HN: Eidos – Offline alternative to Notion

Show HN: Eidos – Offline alternative to Notion

The Eidos project on GitHub offers a personal data management framework as a Progressive Web App with AI features. Customizable with extensions and scripting, it leverages sqlite-wasm technology for chromium-based browsers.

Farm: Fast vite compatible build tool written in Rust

Farm: Fast vite compatible build tool written in Rust

Farm is a Rust-based web building engine for efficient web programming. It accelerates React/Vue projects with fast updates, incremental building, module-level caching, and support for popular technologies like Sass, Less, Vue, and React.

AWS Lambda Web Adapter

AWS Lambda Web Adapter

The GitHub repository provides details on the AWS Lambda Web Adapter, allowing developers to build web apps on AWS Lambda with features like endpoint support, response encoding, and local debugging.

Link Icon 9 comments
By @xnorswap - 5 months
I don't understand these "DB in browser" products.

If the data "belongs" to the server, why not send the query to the server and run it there?

If the data "belongs" on the client, why have it in database form, particularly a "data-lake" structured db, at all?

A lot of the benefits of such databases are their ability to optimise queries for improving performance in a context where the data can't fit in memory (and possibly not even on single disks/machines), as well as additional durability and atomicity improvements. If the data is small enough to be reasonable to send to a client, then it's small enough to fit in memory, which means it'll be fast to query no matter how you go about it.

The page says one advantage is "Ad-hoc queries on data lakes", but isn't that possible with the most basic form that simply sends a query to the database?

What am I failing to understand about this category of products?

By @zX41ZdbW - 5 months
I tried https://shell.duckdb.org/, but it was a very rough experience.

The "delete" button does not work. The "home" button inserts a whitespace. Pasting with "Ctrl+v" also does not work. Every keypress results in blinking, and there is a notable input lag.

When I tried a query

    duckdb> SELECT * FROM 'https://clickhouse-public-datasets.s3.amazonaws.com/github_events/partitioned_json/*.gz'
       ...> ;
    Catalog Error: Table with name https://clickhouse-public-datasets.s3.amazonaws.com/github_events/partitioned_json/*.gz does not exist!
    Did you mean "sqlite_master"?
    LINE 1: SELECT * FROM 'https://clickhouse-public-datasets.s3....
Suggesting the "sqlite_master" database is also misleading.
By @jeroenhd - 5 months
Why run a database in WASM when IndexedDB exists? Browsers already have a database built in, I don't see the need to download another one.
By @Zambyte - 5 months
Another interesting option is PouchDB[0], which is a Javascript implementation of the CouchDB[1] synchronization API. It allows you to acheive eventual consistency between a client with intermittent connectivity, and a backend database.

[0] https://pouchdb.com/

[1] https://couchdb.apache.org/

By @gregw2 - 5 months
I am not 100% clear how this works...

If you query a Parquet file from your lake via DuckDB-in-browser, does DuckDB run in WASM on the web client and pull the compressed parquet to your browser where it is decompressed? Or are you connecting some DuckDB on the web client to some DuckDB component on a server somewhere?

I presume yes to the first and no to the second but just checking I have my mental model correct.

By @threeseed - 5 months
They didn't mention the lifecycle of the database.

Because if it's anything longer lived than a week then it could be used by marketers to evade Apple's ITT for retargeting.

Which would be a huge win for advertisers and a loss for privacy.

By @cryptonector - 5 months
Supercookies?
By @voidUpdate - 5 months
So when I want to browse your website on my phone with a limited data plan, I have to download an entire database client and database, as well as any of your other huge JS libraries?