August 17th, 2024

Just use Postgres

Ethan McCue recommends Postgres as the default database for web applications due to its reliability and ease of use, contrasting it with SQLite and various NoSQL and proprietary databases.

Read original articleLink Icon
Just use Postgres

Ethan McCue advocates for using Postgres as the default database for new applications requiring persistent data storage, particularly for web applications. He contrasts Postgres with other databases like SQLite, which is limited to single-file storage and is better suited for desktop or mobile apps. While SQLite can work for web applications, it lacks the benefits of cloud platforms that offer automatic backups and scalability. McCue also discusses NoSQL databases like DynamoDB, Cassandra, and MongoDB, emphasizing their complexity and the need for upfront knowledge of access patterns, which can hinder flexibility. He critiques Redis, Datomic, XTDB, Kafka, ElasticSearch, and proprietary databases like MSSQL and Oracle DB, highlighting their limitations and potential risks. He notes that while these databases have specific use cases, they often require more management and come with trade-offs that may not be suitable for all applications. McCue concludes that unless there is a compelling reason, developers should default to Postgres for its reliability, support, and ease of use.

- Postgres is recommended for persistent data storage in web applications.

- SQLite is limited to single-file storage, making it less suitable for web apps.

- NoSQL databases require upfront knowledge of access patterns, reducing flexibility.

- Proprietary databases like MSSQL and Oracle DB can lead to costly lock-in.

- Postgres offers reliability and ease of use, making it a safe default choice.

Link Icon 51 comments
By @DarkNova6 - 5 months
> If you see a college student or fresh grad using MongoDB stop them. They need help. They have been led astray.

I like this sentence way more than I should.

By @adityapatadia - 5 months
Almost all statements about MongoDB are wrong.

> You know exactly what your app needs to do, up-front

No one does. Mongodb still perfectly fits.

> You know exactly what your access patterns will be, up-front

This one also no one knows when they start. We successfully scaled MongoDB from a few users a day to millions of queries an hour.

> You have a known need to scale to really large sizes of data

This is exactly a great point. When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

> You are okay giving up some level of consistency

This is said for ages about MongoDB. Today, it provides very good consistency.

> This is because this sort of database is basically a giant distributed hash map.

Putting MongoDB in category of Dynamo is a big mistake. It's NOT a giant distributed hash map.

> Arbitrary questions like "How many users signed up in the last month" can be trivially answered by writing a SQL query, perhaps on a read-replica if you are worried about running an expensive query on the same machine that is dealing with customer traffic. It's just outside the scope of this kind of database. You need to be ETL-ing your data out to handle it.

This shows the author has no idea how MongoDB aggregation works.

I don't want fresh grads to use SQL just because they learn relations (and consistency and constraints and what not). It's perfectly fine to start on MongoDB and make it the primary DB.

By @rcarmo - 5 months
The "SQLite is just a file" thing is actually an advantage. The example of a website is actually a pretty poor one, since any website that needs to scale beyond a single box has many options. The two easiest ones are:

- Mix static and dynamic content generation (and let's face it, most websites are mostly static from a server perspective)

- Designate a writer node and use any of the multiple SQLite replication features

But, in short, if you use an ORM that supports both SQLite and Postgres you'll have the option to upgrade if your site brings in enough traffic. Which might never happen, and in that case you have a trivial backup strategy and no need to maintain, secure and tweak a database server.

By @arpinum - 5 months
It's not worth pointing out the technical flaws in the post[1]. It is obvious the author does not have a strong grasp of the tools he is criticising. A better example of this style of post is Oxide's evaluation[2] for control plane storage that actually goes over their specific needs and context.

[1] Ok, just one, Rick Houlihan is currently at MongoDB.

[2] https://rfd.shared.oxide.computer/rfd/53

By @tetha - 5 months
On the MySQL vs Postgres topic: We migrated for two reasons.

The first is that I consider everything remotely owned by Oracle as a business risk. Personal opinion and maybe too harsh, but Oracle licenses are made to be violated accidentially so you can be sued and put on the license hook once you're audited, try as you might.

But besides that, Postgres gives you more tools to keep your data consistent and the extension world can save a lot of dev-time with very good solutions.

For example, we're often exporting tenants at an SQL level and import somewhere else. This can turn out very weird if those are 12 year old on-prem tenants. MySQL in such a case has you turn of all foreign key validations and whatever happens happens. A lot of fun with every future DB migration is what happens. With Postgres, you just turn on deferred foreign key validation. That way it imports the dump, eventually complains and throws it all away. No migration issues in the future.

Or the overall tooling ecosystem around PostgreSQL just feels more mature and complete to me at least. HA (Patroni and such), Backups (pgbackrest, ...), pg_crypto, pg_partman and so on just offer a lot of very mature solutions to common operational and dev-issues.

By @refset - 5 months
If Postgres already had decent temporal table support (per SQL:2011 system time + application time "bitemporal" versioning) we never would have gone down the road of building XTDB. From the perspective of anyone building applications on top of SQL with complex reporting requirements in heavily regulated sectors (FS, Insurance, Healthcare etc.), "just use temporal tables" would be the ideal default choice. To get an idea of why, see https://docs.xtdb.com/tutorials/financial-usecase/time-in-fi...
By @Kiro - 5 months
When people say "Just use SQLite. It's almost as good as Postgres and you won't need anything more" I'm trying to understand why I shouldn't just use Postgres. It's not like it's hard to install or has any significant overhead. Please enlighten me.
By @christkv - 5 months
Really just reads as an article reaffirming his own bias. For Mongo at least most of it is wrong.

- Secondaries are read replicas and you can specify if you want to read from them using the drivers selecting that you are ok with eventual consistency.

- You can shard to get a distributed system but for small apps you will probably never have to. Sharing can also be geo specific so you query for french data on the french shards etc lowering latency while keeping a global unified system.

- JSON schema can be used to enforce integrity on collections.

- You can join but this I definitely don’t recommend if possible.

- I personally like the pipeline concept for queries and wish there was something like this for relational databases to make writing queries easier.

- The AI query generator based on the data using Atlas has reduced the pain of writing good pipelines. Chat gpt helps a lot here too.

- The change streams are awesome and has let us create a unified trigger system that works outside of the database and it’s easy to use.

We run postgres as well for some parts of the system and it also is great. Just pick the tool that makes the most sense for your usecase.

By @emccue - 5 months
Okay, I am very sorry that I got Rick Houlihan's name wrong.

In my defense, I hadn't watched his talks _recently_ and we've all been Berenstain Bear'ed a few times.

But also the comparison of DynamoDB/Cassandra to MongoDB comes directly from his talks. He currently works at MongoDB. I understand MongoDB has more of a flowery API with some more "powerful" operators. It is still a database where you store denormalized information and therefore is inflexible to changes in access patterns.

By @endisneigh - 5 months
Use what you know, ship useful stuff.
By @tormeh - 5 months
MySQL is like Javascript: Full of bad decisions and footguns. It works perfectly fine, but I don’t see why you’d use it when Postgres exists.
By @cosmicradiance - 5 months
1. With the recent developments at CockroachDB one may like to bundle it along with MSSQL and Oracle.

2. Like the author, I will like to understand "Why not MariaDB? (a free variant of MySql)".

By @MrThoughtful - 5 months
Their reasoning is that some platforms like Heroku do not support SQLite.

Why use those then and not a platform that supports it, like Glitch?

I have used Postgres, MySql etc, but having the project storage in a single file is making things so much easier, I would never ever want to lose that again.

By @throw0101d - 5 months
For MySQL, for smaller deployments, I've found Galera to really be a handy HA system to get going:

> Galera Cluster is a synchronous multi-master database cluster, based on synchronous replication and MySQL and InnoDB. When Galera Cluster is in use, database reads and writes can be directed to any node. Any individual node can be lost without interruption in operations and without using complex failover procedures.

* https://galeracluster.com/library/documentation/overview.htm...

* https://packages.debian.org/search?keywords=galera

The closest out-of-box solution that I know of for Postgres is the proprietary BDR:

* https://www.enterprisedb.com/docs/pgd/4/bdr/

* https://wiki.postgresql.org/wiki/BDR_Project

There are systems like Bucardo, but they are trigger-based and external to the Postgres software:

* https://www.percona.com/blog/multi-master-replication-soluti...

Having a built-in 3-node MMR (or 2N+1arb[0]) solution would solve a bunch of 'simple' HA situations.

[0] https://packages.debian.org/search?keywords=galera-arbitrato...

By @KingOfCoders - 5 months
Postgres will do to other databases, what Linux did to other Unix(/BSD-like) operating systems (IRIX, SunOs, ...).
By @CuriouslyC - 5 months
I'd like to mention that CouchDB is really useful for one reason - a very robust sync story with clients, and a javascript version called PouchDB that can run on the browser and do bidirectional sync with remote Couch instances.

This can be done with sqlite by jumping through a few extra hoops, and now with in-browser WASM postgres, there as well with a few more hoops, but the Couch -> Pouch story is easy and robust.

By @dangoodmanUT - 5 months
There's a lot of _very_ arguably false statements in here, esp around mongo and dynamo.

Postgres still has to "rewrite" data if you need another index. In fact it's about the same amount if you had to add an index for dynamodb...

Also, when's the last time you changed your primary key in a postgres table? Or are you just adding indexes?

By @Longwelwind - 5 months
> It's annoying because, especially with MongoDB, people come into it having been sold on it being a more "flexible" database. Yes, you don't need to give it a schema. Yes, you can just dump untyped JSON into collections. No, this is not a flexible kind of database. It is an efficient one.

I really like this sentence because it perfectly encapsulates a mistake that, I think, people do when considering using MongoDB.

They believe that the schemaless nature of NoSQL database is an advantage because you don't need to do migrations when adding features (adding columns, splitting them, ...). But that's not why NoSQL database should be used. They are used when you are at a scale when the constraints of a schema become too costly and you want your database to be more efficient.

By @dangoodmanUT - 5 months
These posts are always so biased to the person that's used postgres 100x more than any other DB.
By @jb3689 - 5 months
Every database has issues and quirks whether they be about how you design your application, how you need to scale, or how you need to maintain your database. You can play this game “just use XYZ and have no problems”, but it isn’t realistic. Production databases at scale require heavy dedicated infra to stay highly available and performant, and even out of the box solutions require you to understand what is going on and tune them else you run into “surprises” which are almost always that no one RTFM. Pretty much every mainstream database is capable of both highly available and highly consistent workloads at scale. The storage engine largely shouldn’t matter as much as the application tuning.
By @graemep - 5 months
I do not get the reasoning around SQLite.

SQlite is easy to backup, especially if you are OK with write locking for long enough to copy a file. It now has a backup API too of you are not OK with that.

Lots of things do not scale enough to need more than one application server. A lot of the time, even though I mostly use Postgres, the DB and the application are on the same server, which gets rid of the difficulties of working over a network (more configuration, more security issues, more maintenance).

The main reasons I do not use SQLite are its far more limited data types and and its lack of support for things like ALTER COLUMN (other comments have covered these individually).

By @j45 - 5 months
It has never made sense to me why someone uses no, and then proceed to little by little to make their implementation into having relations.

It’s way less work just to learn sql or an orm.

Nosql is great at being a document store.

I’ve used MySQL longer, it’s been a good default option, the jump to how Postgres works and what it offers is too much to ignore.

Postgres can act as a queue, many of the functions that a nosql has, handle being ann embedding db, and do so until a decent volume. It can be the backbone of many low code tools like supabase, hasura, etc. the only thing that’s different is there seems to be nice currents for MySQL but you get the hang of it pretty quick.

By @kingkongjaffa - 5 months
Does anyone know why seemingly all the introductory courses advocated nosql stuff like using mongoDB

Even freecodecamp who is excellent, does this.

They have a rel-db course https://www.freecodecamp.org/learn/relational-database/ but their backend course uses mongodb https://www.freecodecamp.org/learn/back-end-development-and-...

By @aussieguy1234 - 5 months
If you're just starting a startup, go with Postgres for most things. With limited devops resources it'll be good enough in most scenarios.

You can always over optimise later on.

By @yigitcan07 - 5 months
I've found out key/value databases pushes for better architectural designs in enterprise environments. Especially in companies where different teams are responsible for a given business capability and it needs to scale above 1+ million users.

Postgres flexibility enables for design that is hard to scale. Both in terms of maintainability and performance. Enforcing K/V as a default database in one of my previous companies worked wonders.

By @Jupe - 5 months
From the article, DynamoDB-likes are good IF:

* You know exactly what your app needs to do, up-front

But isn't this true of any database? Generally, adding a new index to a 50 million row table is a pain in most RDBs. As is adding a column, or in some cases, even deleting an index. These operations usually incur downtime, or some tricky table duplication with migration process that is rather compute + I/O intensive... and risky.

By @hk__2 - 5 months
> Why not MySQL? MySQL is owned by Oracle.

Ok, so what about MariaDB?

By @andrewstuart - 5 months
Totally agree - I have tried many databases of all flavors, but I always come back to Postgres.

HOWEVER - this blog post is missing a critical point.... the quote should be:

---> Just use Postgres

AND

---> Just use SQL

"Program the machine" stop using abstractions, ORMs, libraries and layers.

Learn how to write SQL - or at least learn how to debug the very good SQL that ChatGPT writes.

Please, use all the very powerful features of Postgres - Full-Text Search, Hstore, Common Table Expressions (CTEs) with Recursive Queries, Window Functions, Foreign Data Wrappers (FDW), put JSON in, get JSON out, Array Data Type, Exclusion Constraints, Range Types, Partial Indexes, Materialized Views, Unlogged Tables, Generated Columns, Event Triggers, Parallel Queries, Query Rewriting with RULES, Logical Replication, PartialIndexes, Policy-Based Row-Level Security (RLS), Publication/Subscription for Logical Replication.

Push all your business logic into big long stored procedures/functions - don't be pulling the data back and munging it in some other language - make the database do the work!

All this stuff you get from programming the machine. Stop using that ORM/lib and write SQL.

EDIT:

People replying saying "only use generic SQL so you cans switch databases!" - to that I say - rubbish!

I nearly wrote a final sentence in the above saying "forget that old wives tale about the dangers of using a databases functionality because you'll need to switch databases in the future and then you'll be stuck!"

Because the reason people switch databases is when they switch to Postgres after finding some other thing didn't get the job done.

The old "tut tut, don't use the true power of a database because you'll need to switch to Oracle/MySQL/SQL server/MongoDB" - that just doesn't hold.

By @christophilus - 5 months
For everyone saying, “Just use SQLite”, how do you deal with pathological queries causing a denial of service? SQLite is synchronous, so you end up blocking your entire application when a query takes a long time. It’s a problem in Postgres, too, especially if the query involves table locks, but your app can Postgres can generally hobble along.
By @nsonha - 5 months
> AI is a bubble

why does it even matter? I know that I need multimodal search in my product, and that is why I need vector DB. You're not saying anything interesting by saying "AI is a bubble". If you say something like I may not actually need RAG/mutimodal/semantic search/dedicated vector db then you may have my attention.

By @cpursley - 5 months
Great post - the comparison to specific tech was really useful.

Just added it to my "Postgres Is Enough" gist: https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

By @mgaunard - 5 months
Not everyone builds the same kind of application and has the same amount of data with the same kind of interactions.
By @ginko - 5 months
I wish postgres had a library only mode that directly stored to a file like sqlite. That'd make starting development a lot easier since you don't have to jump through the hoops of setting up a postgres server. You could then switch to a "proper" DB when your application grows.
By @louwrentius - 5 months
> You can only have so much RAM. You can have a lot more than you'd think, but its still pretty limited compared to hard drives.

Your data fits in ram[0]. [0]: https://yourdatafitsinram.net

By @geenat - 5 months
Horizontal scale of writes.

Citus would be alright if the HA story was better: https://github.com/citusdata/citus/issues/7602

By @evilmonkey19 - 5 months
Personally, most of the projects i do are in self-hosted servers. The traffic isnt big. In such cases sqlite has been way better than postgres. Many times i see postgres not well used. Its meant for big project, not small ones.
By @wood_spirit - 5 months
My own advice would be start with SQLite and do a trivial migration to Postgres if warranted.
By @nu11ptr - 5 months
I don't hate SQL and I agree for many applications it makes sense, but I disagree 100% with "default to a SQL database" (like Postgres). Instead, figure out what you need based on your app.

Recently I had the opportunity to rewrite an application from scratch in a new language. This was a career first for me and I won't go into the why aspect. Anyway, the v1 of the app used SQL and v2 was written against MongoDb. I planned the data access patterns based on knowledge that my DB was effectively document/key/value. The end result: it is much simpler. The v1 DB had like 100+ tables with lots of relations and needs lots of documentation. The v2 DB has like 10 "tables" (or whatever mongo calls them) yet does the same thing. Granted, I could have made 10 equivalent SQL tables as well but this would have defeated the purpose of using SQL in the first place. This isn't to say MongoDB is "better". If I had tons of fancy queries and relations I needed it would be easier with SQL, but for this particular app, it is a MUCH better choice.

TL;DR Don't default to anything, look at your requirements and make an intelligent choice.

By @worik - 5 months
And often no databases manager is the best solution.

Literally

If you are not storing much data no datase manager is th best

By @dmezzetti - 5 months
I'm always cautious with a one-size-fits-all approach. If a team is working on a small project and SQLite works then great. You can use a SQLite database on something like a $4/month DigitalOcean droplet. Can't say the same for Postgres.

> AI is a bubble

Many say this but Generative AI and LLMs have gotten bunched up with everything else. There is a clear need for vectors and multimodal search. There is no core SQL statement to find concepts within an image for example. Machine learning models support that with arrays of numbers (i.e. vectors). pgvector adds vector storage and similarity search for Postgres. There was a recent post about storing vectors in SQLite (https://github.com/asg017/sqlite-vec).

> Even if your business is another AI grift, you probably only need to import openai.

There's much more than this. There are frameworks such as LangChain, LlamaIndex and txtai (disclaimer I'm the primary author of https://github.com/neuml/txtai) that handle generating embeddings locally or with APIs and storing them in databases such as Postgres.

By @sjeneenee - 5 months
Ah yes, the I don't have any other use case therefor all others are not good
By @PeterZaitsev - 5 months
Note... you can use PostgreSQL as MongoDB... with FerretDB :)
By @frithsun - 5 months
Yup. With nothing but love for sqlite.
By @PeterZaitsev - 5 months
Here we go again... Just use X, forever, in all cases, is misguided whatever X is - a database, programming language, ... a vehicle.

PostgreSQL is good for many things and default to PostgreSQL and use something else if clearly justified is a sound advice, but assuming there is no room for anything else but PostgreSQL is not.

By @hit8run - 5 months
Just use SQLite3. You will with 99.99% chance never need more. Now what?
By @fulafel - 5 months
Missing sqlite comparison point: data types. SQLite is like JS with column datatypes, except even looser.

The claim about Datomic only working with JVM languages isn't right, it has a rest api there are eg python and js client libs using that.

By @dom96 - 5 months
This should be titled "Just use Sqlite", you really rarely need anything more unless you're Google or Facebook.
By @bob1029 - 5 months
There is absolutely no reason you can't make SQLite go all the way. Starting with it is the only thing that makes sense to me.

It is certainly a higher performance solution in the fair comparison of a hermetically sealed VM using SQLite vs application server + Postgres instance + Ethernet cable. We're talking 3-4 orders of magnitude difference in latency. It's not even a contest.

There are also a lot of resilience strategies for SQLite that work so much better. For instance, you can just snapshot your VM in AWS every x minutes. This doesn't work for some businesses, but you can also use one of the log replication libraries (perhaps in combination with snapshots). If snapshots work for your business, it's the most trivial thing imaginable to configure and use. Hosted SQL solutions will never come close to this level of simplicity.

I personally got 4 banks to agree to the snapshot model with SQLite for a frontline application. Losing 15 minutes of state was not a big deal given that we've still not had any outages related to SQLite in the 8+ years we've been using it in prod.