August 28th, 2024

Many Small Queries Are Efficient in SQLite

SQLite efficiently manages numerous small queries without performance degradation, unlike traditional databases. Its architecture supports both complex and simple queries, with page generation latency typically under 25 milliseconds.

Read original article

Many Small Queries Are Efficient in SQLite

SQLite can efficiently handle a large number of small queries, unlike traditional client/server databases such as MySQL or PostgreSQL, where excessive queries can lead to performance issues due to message round-trips. The SQLite website's dynamic pages often execute around 200 SQL statements, which has drawn criticism for being excessive. However, this criticism is less applicable to SQLite because it operates within the same process as the application, eliminating the overhead associated with inter-process communication. This allows SQLite to perform many smaller queries without significant performance degradation. The Fossil version control system, which generates these dynamic pages, utilizes both complex queries and the N+1 query pattern, where multiple smaller queries are executed sequentially. This approach is beneficial as it allows for a clear separation of code responsibilities and adapts to varying data needs for different types of objects. Profiling indicates that the latency for generating a timeline page is typically under 25 milliseconds, with minimal time spent in the database engine. Thus, SQLite's architecture supports both large complex queries and numerous smaller queries, providing flexibility for application developers to choose the most effective method for their specific tasks.

- SQLite can efficiently handle many small queries without performance issues.

- The N+1 query pattern is acceptable in SQLite due to its architecture.

- Dynamic pages on the SQLite website often execute around 200 SQL statements.

- The Fossil version control system exemplifies effective use of both complex and simple queries.

- Latency for generating pages in SQLite is typically under 25 milliseconds.

Optimizing Large-Scale OpenStreetMap Data with SQLite

The article discusses optimizing large-scale OpenStreetMap data with SQLite. Converting OSMPBF to SQLite enhanced search functionalities. Indexing, full-text search, and compression improved query performance, despite some false positives.

First Contact with SQLite

The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.

Gotchas with SQLite in Production

SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.

SQLite: 35% Faster Than the Filesystem

SQLite outperforms traditional file I/O by achieving 35% faster reads of small blobs and using 20% less disk space for 10-kilobyte blobs, despite some latency issues in specific scenarios.

SQLite vs. PostgreSQL

SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.

11 comments

By @bob1029 - 8 months

I built a rules engine around SQLite queries for the last product I was working on. This article was the reason.

You can run an unbelievable # of select statements per unit time against a SQLite database. I think of it like a direct method invocation to members of some List<T> within the application itself.

Developers who still protest SQLite are really sleeping on the latency advantages. 2-3 orders of magnitude are really hard to fight against. This opens up entirely new use cases that hosted solutions cannot consider.

Databases that are built into the application by default are the future. Our computers are definitely big enough. There are many options for log replication (sync & async), as well as the obvious hypervisor snapshot approach (WAL can make this much less dangerous).

By @simonw - 8 months

An interesting result of this is that SQLite is a really good database for implementing GraphQL.

The biggest problem with GraphQL is how easy it becomes to accidentally trigger N+1 queries. As this article explains, if you're using SQLite you don't need to worry about pages that accidentally run 100s of queries, provided those queries are fast - and it's not too hard to build a GraphQL API where all of the edge resolutions are indexed lookups with a limit to the number of rows they'll consider.

I had a lot of fun building a Datasette GraphQL plugin a while ago: https://datasette.io/plugins/datasette-graphql

By @adrianmonk - 8 months

It's true that RPC latency doesn't exist, and that opens up other possibilities that are worth considering.

But if you do a ton of small queries instead of one big one, you could be depriving the database of the opportunity to choose an efficient execution plan.

For example, if you do a join by querying one table for a bunch of IDs and then looking up each key with an individual select, you're forcing the database into doing a nested loop join. Maybe a merge join or hash join would have been faster.

Or maybe not. Sometimes the way you write the queries corresponds to what the database would have done anyway. Just not necessarily, so it's something to keep in mind.

By @crabmusket - 8 months

Pet peeve: why is the common term "n+1 query problem" when it's the 1 that happens first and the n that is the problem?

As a pedant, I've been referring to it as a "1+n" problem, but haven't managed to make it catch on yet!

By @samwillis - 8 months

I love this explanation.

It's interesting to consider the question of what this would look like if you put Postgres, rather than SQLite, in process? With PGlite we can actually look at it.

I'm hesitant to post a link to benchmarks after the last 24 hours on Twitter... but I have some basic micro-benchmarks comparing WASM SQLite to PGlite (WASM Postgres that runs in-process): https://pglite.dev/benchmarks#round-trip-time-benchmarks

It's very much in the same ballpark, Postgres is a heavier databases and is understandably slower, but not by much. There is a lot of nuance to these benchmarks though as the underlying VFSs are a little different to each other, and PGlite has a WAL rather then SQLite which is in its rollback journal mode (I believe this is why PGlite is faster for some inserts/updates).

But essentially I think Postgres when in-process would be able to perform similarly to SQLite with many small queries and embracing n+1. But having said that I think the other comments here about query planning are important to consider, if you can minimise your queries, you minimise the scans of indexes and tables, which is surely better.

What this does show, particularly when you look at the "native" comparison at the end, is that removing the network (or at least a local socket) from the stack being the two closer together.

By @treebeard901 - 8 months

The article doesn't explain why this is a bad idea for database servers hosted on a remote machine. The first reason is obvious... Network connections take memory and processing power on both client and server. Each additional query causes more resource usage. It is unnecessary overhead which is why things like multiple active result sets were created for SQL Server.

The network round trip time can also add up if you run into resource constraints doing this.

On a remote database, you also have to contend with multiple users and so complicated locking techniques can come into play depending upon the complexity of all database activity.

Many databases have options to return multiple result sets from one connection which helps control the overhead caused by this usage pattern.

EDIT: This also brings back horrible memories where developers would do this in a db client server architecture. Then they would often not close the DB connections when done. So you could have thousands of active connections basically doing nothing. Luckily, this problem was solved with better database connection handling.

By @aynyc - 8 months

I still can't figure out how people are using sqlite in server context.

1. How are they managing the concurrent writes? I know about lock or queue, but if my stack needs multiple server, then what?

2. How do they manage durable data store? Since sqlite doesn't have a replication mechanism, how do people handle losing transactions?

By @eyelidlessness - 8 months

One thing I don’t see mentioned yet is that efficiency isn’t the only reason multiple queries can be problematic. There’s also consistency. This might not matter for many SQLite use cases, or in general for blocking single-reader calls. But if your database might be handling concurrent writes that could occur between any two reads, reducing the overhead to zero still might not be worth the tradeoff.

By @prisenco - 8 months

Effectively (not completely but in general real world use) solving the N+1 queries problem is one of many reasons why SQLite is my first choice for greenfield projects.

The only concern I have is it isn't portable. Move to Postgres and your app will choke. This can be acceptable while scaling because SQLite can go much farther than people realize. But it is good to keep in mind.

By @nh2 - 8 months

Anybody have a benchmark that shows that postgres is actually slower when on the same machine with same durability settings?

By @slaymaker1907 - 8 months

Oh, I've definitely seen N+1 problems with a SQLite database before. It's more resilient than a traditional DB server, but it can still run into performance issues when doing too many select queries.