SQLite: 35% Faster Than the Filesystem
SQLite outperforms traditional file I/O by achieving 35% faster reads of small blobs and using 20% less disk space for 10-kilobyte blobs, despite some latency issues in specific scenarios.
Read original articleSQLite demonstrates a read and write performance advantage over traditional filesystem operations, achieving reads of small blobs approximately 35% faster than using direct file I/O methods like fread() and fwrite(). Additionally, a single SQLite database can store 10-kilobyte blobs using about 20% less disk space compared to individual files, primarily due to reduced overhead from system calls and more efficient data packing. The performance benefits stem from the fact that SQLite minimizes the number of open and close system calls, which are required for each file when using individual blobs.
While the 35% speed increase is an average figure, actual performance can vary based on hardware and operating system conditions. Some users have reported higher latency with SQLite compared to direct I/O, particularly in scenarios with a cold filesystem cache. However, the overall findings challenge the assumption that relational databases are inherently slower than direct filesystem access.
Further studies indicate that SQLite can outperform other filesystems like Btrfs and Ext4 in real-world workloads. Performance tests conducted on various systems, including Windows and Android, show that SQLite can be significantly faster, especially when using optimized read methods like sqlite3_blob_read() and memory-mapped I/O. Write operations, however, are generally slower, with performance varying widely depending on the method used and whether transactional safety measures are implemented.
Related
Optimizing Large-Scale OpenStreetMap Data with SQLite
The article discusses optimizing large-scale OpenStreetMap data with SQLite. Converting OSMPBF to SQLite enhanced search functionalities. Indexing, full-text search, and compression improved query performance, despite some false positives.
First Contact with SQLite
The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.
Notion about their usage of WASM SQLite
Notion enhanced browser performance by integrating WebAssembly SQLite, OPFS, and Web Workers technologies. Overcoming challenges, they improved page navigation by 20%, optimizing SQLite usage for efficient cross-tab queries and compatibility.
Show HN: SQLite Transaction Benchmarking Tool
The "sqlite-bench" GitHub project tests SQLite transaction behavior. It provides code, compilation guidelines, and running instructions. To protect SSD, run benchmarks on in-memory filesystem first. Docker image available.
Gotchas with SQLite in Production
SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.
Makes sense when you consider you're throwing out functionality and disregarding general purpose design.
If you use a fuse mapping to SQLite, mount that directory and access it, you'd probably be very similar performance (perhaps even slower) and storage use as you'd need to add additional columns in the table to track these attributes.
I have no doubt that you could create a custom tuned file system on a dedicated mount with attributes disabled, minimized file table and correct/optimized block size and get very near to this perf.
Let's not forget the simplicity of being able to use shell commands (like rsync) to browse and manipulate those files without running the application or an SQL client to debug.
Makes sense for developers to use SQLite for this use case though for an appliance-type application or for packaged static assets (this is already commonplace in game development - a cab file is essentially the same concept)
Let's also note the 4x speed increase on windows 10, once again underlining just how slow windows filesystem calls are, when compared to direct access, and other (kernel, filesystem) combinations.
I was debating what storage layer to use and decided to try SQLite because of its speed claims — essentially a single table where each row is a MIDI event from the piano (note on, note off, control pedal, velocity, timestamp). No transactions, just raw inserts on every possible event. It so far has worked beautifully: it’s performant AND I can do fun analysis later on, e.g. to see what keys I hit more than others or what my average note velocity is.
I wonder how io_uring compares.
I've used SQLite blob fields for storing files extensively.
Note that there is a 2GB blob maximum: https://www.sqlite.org/limits.html
To read/write blobs, you have to serialize/deserialize your objects to bytes. This process is not only tedious, but also varies for different objects and it's not a first-class citizen in other tools, so serialization kept breaking as my dependencies upgraded.
As my app matured, I found that I often wanted hierarchical folder-like functionality. Rather than recreating this mess in db relationships, it was easier to store the path and other folder-level metadata in sqlite so that I could work with it in Python. E.g. `os.listdir(my_folder)`.
Also, if you want to interact with other systems/services, then you need files. sqlite can't be read over NFS (e.g. AWS EFS) and by design it has no server for requests. so i found myself caching files to disk for export/import.
SQLite has some settings for handling parallel requests from multiple services, but when I experimented with them I always wound up with a locked db due to competing requests.
For one reason or another, you will end up with hybrid (blob/file) ways of persisting data.
It surprised me that almost all required PostgreSQL, and most of those that didn't opted for something otherwise complex such as Mongo or MySQL.
SQLite, with no dependencies, would have simplified the process no end.
All fs/drive access is managed by the OS. No DB systems have raw access to sectors or direct raw access to files.
Having a database file on the disc, offers a "cluster" of successive blocks on the hard drive (if it's not fragmented), resulting in relatively short moving distances of the drive head to seek the necessary sectors. There will still be the same sectors occupied, even after vast insert/write/del operations. Absolutely no change of DB file's position on hard drive. It's not a problem with SSDs, though.
So, the following apply:
client -> DB -> OS -> Filesystem
I think, you already can see the DB part is an extra layer. So, if one wouldn't have this, it would be "faster" in terms of execution time. Always.
If it's slower, then you use the not-optimal settings for your use case/filesystem.
My father did this once. He took H2 and made it even more faster :) incredible fast on Windows in direct comparison of H2/h2-modificated with same data.
So having a DBMS is convenient and made in decisions to serve certain domains and their problems. Having it is convenient, but that doesn't mean it's the most optimized way of doing it.
* https://news.ycombinator.com/item?id=14550060 7 years ago
* https://news.ycombinator.com/item?id=20729930 5 years ago
* https://news.ycombinator.com/item?id=27137834 3 years ago
* https://news.ycombinator.com/item?id=27897427 3 years ago
* https://news.ycombinator.com/item?id=34387407 2 years ago
In my opinion, SQLite can be faster in big reads and writes too, but the team didn't optimise it as much (like loading the whole content into memory) as maybe it was not the main use of the project. My hope is that we will see even faster speeds in the future.
[1] https://pack.ac [2] https://forum.lazarus.freepascal.org/index.php/topic,66281.m...
What is SQLite not doing that filesystems are?
It's important. But understandable.
Because with OpenBSD introduction of pinning all syscalls to libc, doesn’t this block SQLite from making syscall direct.
not a native speaker, what does it mean?
Which is a bit d'oh, since being faster for some things is one of the main motivations for a database in the first place.
Related
Optimizing Large-Scale OpenStreetMap Data with SQLite
The article discusses optimizing large-scale OpenStreetMap data with SQLite. Converting OSMPBF to SQLite enhanced search functionalities. Indexing, full-text search, and compression improved query performance, despite some false positives.
First Contact with SQLite
The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.
Notion about their usage of WASM SQLite
Notion enhanced browser performance by integrating WebAssembly SQLite, OPFS, and Web Workers technologies. Overcoming challenges, they improved page navigation by 20%, optimizing SQLite usage for efficient cross-tab queries and compatibility.
Show HN: SQLite Transaction Benchmarking Tool
The "sqlite-bench" GitHub project tests SQLite transaction behavior. It provides code, compilation guidelines, and running instructions. To protect SSD, run benchmarks on in-memory filesystem first. Docker image available.
Gotchas with SQLite in Production
SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.