July 27th, 2024

SQLite: 35% Faster Than the Filesystem

SQLite outperforms traditional file I/O by achieving 35% faster reads of small blobs and using 20% less disk space for 10-kilobyte blobs, despite some latency issues in specific scenarios.

Read original articleLink Icon
SQLite: 35% Faster Than the Filesystem

SQLite demonstrates a read and write performance advantage over traditional filesystem operations, achieving reads of small blobs approximately 35% faster than using direct file I/O methods like fread() and fwrite(). Additionally, a single SQLite database can store 10-kilobyte blobs using about 20% less disk space compared to individual files, primarily due to reduced overhead from system calls and more efficient data packing. The performance benefits stem from the fact that SQLite minimizes the number of open and close system calls, which are required for each file when using individual blobs.

While the 35% speed increase is an average figure, actual performance can vary based on hardware and operating system conditions. Some users have reported higher latency with SQLite compared to direct I/O, particularly in scenarios with a cold filesystem cache. However, the overall findings challenge the assumption that relational databases are inherently slower than direct filesystem access.

Further studies indicate that SQLite can outperform other filesystems like Btrfs and Ext4 in real-world workloads. Performance tests conducted on various systems, including Windows and Android, show that SQLite can be significantly faster, especially when using optimized read methods like sqlite3_blob_read() and memory-mapped I/O. Write operations, however, are generally slower, with performance varying widely depending on the method used and whether transactional safety measures are implemented.

Link Icon 25 comments
By @tgtweak - 4 months
No file system attributes or metadata on records which also means no (xattrs/fattrs) being written or updated, no checks to see if it's a physical file or a pipe/symlink, no permission checks, no block size alignment mismatches, single open command.

Makes sense when you consider you're throwing out functionality and disregarding general purpose design.

If you use a fuse mapping to SQLite, mount that directory and access it, you'd probably be very similar performance (perhaps even slower) and storage use as you'd need to add additional columns in the table to track these attributes.

I have no doubt that you could create a custom tuned file system on a dedicated mount with attributes disabled, minimized file table and correct/optimized block size and get very near to this perf.

Let's not forget the simplicity of being able to use shell commands (like rsync) to browse and manipulate those files without running the application or an SQL client to debug.

Makes sense for developers to use SQLite for this use case though for an appliance-type application or for packaged static assets (this is already commonplace in game development - a cab file is essentially the same concept)

By @lc64 - 4 months
That's a very rigorously written article.

Let's also note the 4x speed increase on windows 10, once again underlining just how slow windows filesystem calls are, when compared to direct access, and other (kernel, filesystem) combinations.

By @robertclaus - 4 months
I did some research in a database research lab, and we had a lot of colleagues working on OS research. It was always interesting to compare the constraints and assumptions across the two systems. I remember one of the big differences was the scale of individual records we expected to be working with, which in turn affected how memory and disk was managed. Most relational databases are very much optimized for small individual records and eventual consistency, which allows them to cache a lot more in memory. On the other hand, performance often drops sharply with the size of your rows.
By @igammarays - 4 months
This is precisely why I'm considering appending to a sqlite DB in WAL2 mode instead of plain text log files. Almost no performance penalty for writes but huge advantages for reading/analysis. No more Grafana needed.
By @freedmand - 4 months
I recently had the idea to record every note coming out of my digital piano in real-time. That way if I come up with a good idea when noodling around I don’t have to hope I can remember it later.

I was debating what storage layer to use and decided to try SQLite because of its speed claims — essentially a single table where each row is a MIDI event from the piano (note on, note off, control pedal, velocity, timestamp). No transactions, just raw inserts on every possible event. It so far has worked beautifully: it’s performant AND I can do fun analysis later on, e.g. to see what keys I hit more than others or what my average note velocity is.

By @leni536 - 4 months
> The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database

I wonder how io_uring compares.

By @Upvoter33 - 4 months
When something built on top of the filesystem is "faster" than the filesystem, it just means "when you use the filesystem in a less-than-optimal manner, it will be slower than an app that uses it in a sophisticated manner." An interesting point, but perhaps obvious...
By @Kalanos - 4 months
TLDR; don't do it.

I've used SQLite blob fields for storing files extensively.

Note that there is a 2GB blob maximum: https://www.sqlite.org/limits.html

To read/write blobs, you have to serialize/deserialize your objects to bytes. This process is not only tedious, but also varies for different objects and it's not a first-class citizen in other tools, so serialization kept breaking as my dependencies upgraded.

As my app matured, I found that I often wanted hierarchical folder-like functionality. Rather than recreating this mess in db relationships, it was easier to store the path and other folder-level metadata in sqlite so that I could work with it in Python. E.g. `os.listdir(my_folder)`.

Also, if you want to interact with other systems/services, then you need files. sqlite can't be read over NFS (e.g. AWS EFS) and by design it has no server for requests. so i found myself caching files to disk for export/import.

SQLite has some settings for handling parallel requests from multiple services, but when I experimented with them I always wound up with a locked db due to competing requests.

For one reason or another, you will end up with hybrid (blob/file) ways of persisting data.

By @throwaway211 - 4 months
I was looking at self hosted RSS readers recently. The application is single user. I don't expect it to be doing a lot of DB intensive stuff.

It surprised me that almost all required PostgreSQL, and most of those that didn't opted for something otherwise complex such as Mongo or MySQL.

SQLite, with no dependencies, would have simplified the process no end.

By @theGeatZhopa - 4 months
Depends, depends.. but just of logic:

All fs/drive access is managed by the OS. No DB systems have raw access to sectors or direct raw access to files.

Having a database file on the disc, offers a "cluster" of successive blocks on the hard drive (if it's not fragmented), resulting in relatively short moving distances of the drive head to seek the necessary sectors. There will still be the same sectors occupied, even after vast insert/write/del operations. Absolutely no change of DB file's position on hard drive. It's not a problem with SSDs, though.

So, the following apply:

client -> DB -> OS -> Filesystem

I think, you already can see the DB part is an extra layer. So, if one wouldn't have this, it would be "faster" in terms of execution time. Always.

If it's slower, then you use the not-optimal settings for your use case/filesystem.

My father did this once. He took H2 and made it even more faster :) incredible fast on Windows in direct comparison of H2/h2-modificated with same data.

So having a DBMS is convenient and made in decisions to serve certain domains and their problems. Having it is convenient, but that doesn't mean it's the most optimized way of doing it.

By @RaiausderDose - 4 months
numbers are from 2017, update would be cool
By @OttoCoddo - 4 months
SQLite can be faster than FileSystem for small files. For big files, it can do more than 1 GB/s. On Pack [1], I benchmarked these speeds, and you can go very fast. It can be even 2X faster than tar [2].

In my opinion, SQLite can be faster in big reads and writes too, but the team didn't optimise it as much (like loading the whole content into memory) as maybe it was not the main use of the project. My hope is that we will see even faster speeds in the future.

[1] https://pack.ac [2] https://forum.lazarus.freepascal.org/index.php/topic,66281.m...

By @jstummbillig - 4 months
Let's assume that filesystems are fairly optimized pieces of software. Let's assume that the people building them heard of databases and at some point along the way considered things like the costs of open/close calls.

What is SQLite not doing that filesystems are?

By @cedws - 4 months
How much more performance could you get by bypassing the filesystem and writing directly to the block device? Of course, you'd need to effectively implement your own filesystem, but you'd be able to optimise it more for the specific workload.
By @throwaway211 - 4 months
i.e. opening and closing many files from disk is slower than opening and closing one file and using memory.

It's important. But understandable.

By @me551ah - 4 months
Why hasn’t someone made sqlitefs yet?
By @alberth - 4 months
Slight OT: does this apply to SQLite on OpenBSD?

Because with OpenBSD introduction of pinning all syscalls to libc, doesn’t this block SQLite from making syscall direct.

https://news.ycombinator.com/item?id=38579913

By @vagab0nd - 4 months
By @throwaway81523 - 4 months
Deleting a lot of rows from an sqlite database can be awfully slow, compared with deleting a file.
By @efilife - 4 months
> Reading is about an order of magnitude faster than writing

not a native speaker, what does it mean?

By @The_Colonel - 4 months
* for certain operations.

Which is a bit d'oh, since being faster for some things is one of the main motivations for a database in the first place.