Many Small Queries Are Efficient in SQLite
SQLite efficiently manages numerous small queries without performance degradation, unlike traditional databases. Its architecture supports both complex and simple queries, with page generation latency typically under 25 milliseconds.
Read original articleSQLite can efficiently handle a large number of small queries, unlike traditional client/server databases such as MySQL or PostgreSQL, where excessive queries can lead to performance issues due to message round-trips. The SQLite website's dynamic pages often execute around 200 SQL statements, which has drawn criticism for being excessive. However, this criticism is less applicable to SQLite because it operates within the same process as the application, eliminating the overhead associated with inter-process communication. This allows SQLite to perform many smaller queries without significant performance degradation. The Fossil version control system, which generates these dynamic pages, utilizes both complex queries and the N+1 query pattern, where multiple smaller queries are executed sequentially. This approach is beneficial as it allows for a clear separation of code responsibilities and adapts to varying data needs for different types of objects. Profiling indicates that the latency for generating a timeline page is typically under 25 milliseconds, with minimal time spent in the database engine. Thus, SQLite's architecture supports both large complex queries and numerous smaller queries, providing flexibility for application developers to choose the most effective method for their specific tasks.
- SQLite can efficiently handle many small queries without performance issues.
- The N+1 query pattern is acceptable in SQLite due to its architecture.
- Dynamic pages on the SQLite website often execute around 200 SQL statements.
- The Fossil version control system exemplifies effective use of both complex and simple queries.
- Latency for generating pages in SQLite is typically under 25 milliseconds.
Related
Optimizing Large-Scale OpenStreetMap Data with SQLite
The article discusses optimizing large-scale OpenStreetMap data with SQLite. Converting OSMPBF to SQLite enhanced search functionalities. Indexing, full-text search, and compression improved query performance, despite some false positives.
First Contact with SQLite
The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.
Gotchas with SQLite in Production
SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.
SQLite: 35% Faster Than the Filesystem
SQLite outperforms traditional file I/O by achieving 35% faster reads of small blobs and using 20% less disk space for 10-kilobyte blobs, despite some latency issues in specific scenarios.
SQLite vs. PostgreSQL
SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.
You can run an unbelievable # of select statements per unit time against a SQLite database. I think of it like a direct method invocation to members of some List<T> within the application itself.
Developers who still protest SQLite are really sleeping on the latency advantages. 2-3 orders of magnitude are really hard to fight against. This opens up entirely new use cases that hosted solutions cannot consider.
Databases that are built into the application by default are the future. Our computers are definitely big enough. There are many options for log replication (sync & async), as well as the obvious hypervisor snapshot approach (WAL can make this much less dangerous).
The biggest problem with GraphQL is how easy it becomes to accidentally trigger N+1 queries. As this article explains, if you're using SQLite you don't need to worry about pages that accidentally run 100s of queries, provided those queries are fast - and it's not too hard to build a GraphQL API where all of the edge resolutions are indexed lookups with a limit to the number of rows they'll consider.
I had a lot of fun building a Datasette GraphQL plugin a while ago: https://datasette.io/plugins/datasette-graphql
But if you do a ton of small queries instead of one big one, you could be depriving the database of the opportunity to choose an efficient execution plan.
For example, if you do a join by querying one table for a bunch of IDs and then looking up each key with an individual select, you're forcing the database into doing a nested loop join. Maybe a merge join or hash join would have been faster.
Or maybe not. Sometimes the way you write the queries corresponds to what the database would have done anyway. Just not necessarily, so it's something to keep in mind.
As a pedant, I've been referring to it as a "1+n" problem, but haven't managed to make it catch on yet!
It's interesting to consider the question of what this would look like if you put Postgres, rather than SQLite, in process? With PGlite we can actually look at it.
I'm hesitant to post a link to benchmarks after the last 24 hours on Twitter... but I have some basic micro-benchmarks comparing WASM SQLite to PGlite (WASM Postgres that runs in-process): https://pglite.dev/benchmarks#round-trip-time-benchmarks
It's very much in the same ballpark, Postgres is a heavier databases and is understandably slower, but not by much. There is a lot of nuance to these benchmarks though as the underlying VFSs are a little different to each other, and PGlite has a WAL rather then SQLite which is in its rollback journal mode (I believe this is why PGlite is faster for some inserts/updates).
But essentially I think Postgres when in-process would be able to perform similarly to SQLite with many small queries and embracing n+1. But having said that I think the other comments here about query planning are important to consider, if you can minimise your queries, you minimise the scans of indexes and tables, which is surely better.
What this does show, particularly when you look at the "native" comparison at the end, is that removing the network (or at least a local socket) from the stack being the two closer together.
The network round trip time can also add up if you run into resource constraints doing this.
On a remote database, you also have to contend with multiple users and so complicated locking techniques can come into play depending upon the complexity of all database activity.
Many databases have options to return multiple result sets from one connection which helps control the overhead caused by this usage pattern.
EDIT: This also brings back horrible memories where developers would do this in a db client server architecture. Then they would often not close the DB connections when done. So you could have thousands of active connections basically doing nothing. Luckily, this problem was solved with better database connection handling.
1. How are they managing the concurrent writes? I know about lock or queue, but if my stack needs multiple server, then what?
2. How do they manage durable data store? Since sqlite doesn't have a replication mechanism, how do people handle losing transactions?
The only concern I have is it isn't portable. Move to Postgres and your app will choke. This can be acceptable while scaling because SQLite can go much farther than people realize. But it is good to keep in mind.
Related
Optimizing Large-Scale OpenStreetMap Data with SQLite
The article discusses optimizing large-scale OpenStreetMap data with SQLite. Converting OSMPBF to SQLite enhanced search functionalities. Indexing, full-text search, and compression improved query performance, despite some false positives.
First Contact with SQLite
The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.
Gotchas with SQLite in Production
SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.
SQLite: 35% Faster Than the Filesystem
SQLite outperforms traditional file I/O by achieving 35% faster reads of small blobs and using 20% less disk space for 10-kilobyte blobs, despite some latency issues in specific scenarios.
SQLite vs. PostgreSQL
SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.