Just use Postgres
Ethan McCue recommends Postgres as the default database for web applications due to its reliability and ease of use, contrasting it with SQLite and various NoSQL and proprietary databases.
Read original articleEthan McCue advocates for using Postgres as the default database for new applications requiring persistent data storage, particularly for web applications. He contrasts Postgres with other databases like SQLite, which is limited to single-file storage and is better suited for desktop or mobile apps. While SQLite can work for web applications, it lacks the benefits of cloud platforms that offer automatic backups and scalability. McCue also discusses NoSQL databases like DynamoDB, Cassandra, and MongoDB, emphasizing their complexity and the need for upfront knowledge of access patterns, which can hinder flexibility. He critiques Redis, Datomic, XTDB, Kafka, ElasticSearch, and proprietary databases like MSSQL and Oracle DB, highlighting their limitations and potential risks. He notes that while these databases have specific use cases, they often require more management and come with trade-offs that may not be suitable for all applications. McCue concludes that unless there is a compelling reason, developers should default to Postgres for its reliability, support, and ease of use.
- Postgres is recommended for persistent data storage in web applications.
- SQLite is limited to single-file storage, making it less suitable for web apps.
- NoSQL databases require upfront knowledge of access patterns, reducing flexibility.
- Proprietary databases like MSSQL and Oracle DB can lead to costly lock-in.
- Postgres offers reliability and ease of use, making it a safe default choice.
Related
Just Use Postgres for Everything
The article promotes using Postgres extensively in tech stacks to simplify development, improve scalability, and reduce operational complexity. By replacing various technologies with Postgres, developers can enhance productivity, focus on customer value, and potentially cut costs.
Just Use Postgres for Everything
The blog post advocates for using PostgreSQL extensively in tech stacks to simplify development, improve productivity, and reduce complexity. It highlights benefits like scalability, efficiency, and cost-effectiveness, promoting a consolidated approach.
Gotchas with SQLite in Production
SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.
Is an All-in-One Database the Future?
Specialized databases are emerging to tackle complex data challenges, leading to intricate infrastructures. A universal, all-in-one database remains unfulfilled due to optimization issues and unique challenges of different database types.
SQLite vs. PostgreSQL
SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.
I like this sentence way more than I should.
> You know exactly what your app needs to do, up-front
No one does. Mongodb still perfectly fits.
> You know exactly what your access patterns will be, up-front
This one also no one knows when they start. We successfully scaled MongoDB from a few users a day to millions of queries an hour.
> You have a known need to scale to really large sizes of data
This is exactly a great point. When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.
> You are okay giving up some level of consistency
This is said for ages about MongoDB. Today, it provides very good consistency.
> This is because this sort of database is basically a giant distributed hash map.
Putting MongoDB in category of Dynamo is a big mistake. It's NOT a giant distributed hash map.
> Arbitrary questions like "How many users signed up in the last month" can be trivially answered by writing a SQL query, perhaps on a read-replica if you are worried about running an expensive query on the same machine that is dealing with customer traffic. It's just outside the scope of this kind of database. You need to be ETL-ing your data out to handle it.
This shows the author has no idea how MongoDB aggregation works.
I don't want fresh grads to use SQL just because they learn relations (and consistency and constraints and what not). It's perfectly fine to start on MongoDB and make it the primary DB.
- Mix static and dynamic content generation (and let's face it, most websites are mostly static from a server perspective)
- Designate a writer node and use any of the multiple SQLite replication features
But, in short, if you use an ORM that supports both SQLite and Postgres you'll have the option to upgrade if your site brings in enough traffic. Which might never happen, and in that case you have a trivial backup strategy and no need to maintain, secure and tweak a database server.
[1] Ok, just one, Rick Houlihan is currently at MongoDB.
The first is that I consider everything remotely owned by Oracle as a business risk. Personal opinion and maybe too harsh, but Oracle licenses are made to be violated accidentially so you can be sued and put on the license hook once you're audited, try as you might.
But besides that, Postgres gives you more tools to keep your data consistent and the extension world can save a lot of dev-time with very good solutions.
For example, we're often exporting tenants at an SQL level and import somewhere else. This can turn out very weird if those are 12 year old on-prem tenants. MySQL in such a case has you turn of all foreign key validations and whatever happens happens. A lot of fun with every future DB migration is what happens. With Postgres, you just turn on deferred foreign key validation. That way it imports the dump, eventually complains and throws it all away. No migration issues in the future.
Or the overall tooling ecosystem around PostgreSQL just feels more mature and complete to me at least. HA (Patroni and such), Backups (pgbackrest, ...), pg_crypto, pg_partman and so on just offer a lot of very mature solutions to common operational and dev-issues.
- Secondaries are read replicas and you can specify if you want to read from them using the drivers selecting that you are ok with eventual consistency.
- You can shard to get a distributed system but for small apps you will probably never have to. Sharing can also be geo specific so you query for french data on the french shards etc lowering latency while keeping a global unified system.
- JSON schema can be used to enforce integrity on collections.
- You can join but this I definitely don’t recommend if possible.
- I personally like the pipeline concept for queries and wish there was something like this for relational databases to make writing queries easier.
- The AI query generator based on the data using Atlas has reduced the pain of writing good pipelines. Chat gpt helps a lot here too.
- The change streams are awesome and has let us create a unified trigger system that works outside of the database and it’s easy to use.
We run postgres as well for some parts of the system and it also is great. Just pick the tool that makes the most sense for your usecase.
In my defense, I hadn't watched his talks _recently_ and we've all been Berenstain Bear'ed a few times.
But also the comparison of DynamoDB/Cassandra to MongoDB comes directly from his talks. He currently works at MongoDB. I understand MongoDB has more of a flowery API with some more "powerful" operators. It is still a database where you store denormalized information and therefore is inflexible to changes in access patterns.
2. Like the author, I will like to understand "Why not MariaDB? (a free variant of MySql)".
Why use those then and not a platform that supports it, like Glitch?
I have used Postgres, MySql etc, but having the project storage in a single file is making things so much easier, I would never ever want to lose that again.
> Galera Cluster is a synchronous multi-master database cluster, based on synchronous replication and MySQL and InnoDB. When Galera Cluster is in use, database reads and writes can be directed to any node. Any individual node can be lost without interruption in operations and without using complex failover procedures.
* https://galeracluster.com/library/documentation/overview.htm...
* https://packages.debian.org/search?keywords=galera
The closest out-of-box solution that I know of for Postgres is the proprietary BDR:
* https://www.enterprisedb.com/docs/pgd/4/bdr/
* https://wiki.postgresql.org/wiki/BDR_Project
There are systems like Bucardo, but they are trigger-based and external to the Postgres software:
* https://www.percona.com/blog/multi-master-replication-soluti...
Having a built-in 3-node MMR (or 2N+1arb[0]) solution would solve a bunch of 'simple' HA situations.
[0] https://packages.debian.org/search?keywords=galera-arbitrato...
This can be done with sqlite by jumping through a few extra hoops, and now with in-browser WASM postgres, there as well with a few more hoops, but the Couch -> Pouch story is easy and robust.
Postgres still has to "rewrite" data if you need another index. In fact it's about the same amount if you had to add an index for dynamodb...
Also, when's the last time you changed your primary key in a postgres table? Or are you just adding indexes?
I really like this sentence because it perfectly encapsulates a mistake that, I think, people do when considering using MongoDB.
They believe that the schemaless nature of NoSQL database is an advantage because you don't need to do migrations when adding features (adding columns, splitting them, ...). But that's not why NoSQL database should be used. They are used when you are at a scale when the constraints of a schema become too costly and you want your database to be more efficient.
SQlite is easy to backup, especially if you are OK with write locking for long enough to copy a file. It now has a backup API too of you are not OK with that.
Lots of things do not scale enough to need more than one application server. A lot of the time, even though I mostly use Postgres, the DB and the application are on the same server, which gets rid of the difficulties of working over a network (more configuration, more security issues, more maintenance).
The main reasons I do not use SQLite are its far more limited data types and and its lack of support for things like ALTER COLUMN (other comments have covered these individually).
It’s way less work just to learn sql or an orm.
Nosql is great at being a document store.
I’ve used MySQL longer, it’s been a good default option, the jump to how Postgres works and what it offers is too much to ignore.
Postgres can act as a queue, many of the functions that a nosql has, handle being ann embedding db, and do so until a decent volume. It can be the backbone of many low code tools like supabase, hasura, etc. the only thing that’s different is there seems to be nice currents for MySQL but you get the hang of it pretty quick.
Even freecodecamp who is excellent, does this.
They have a rel-db course https://www.freecodecamp.org/learn/relational-database/ but their backend course uses mongodb https://www.freecodecamp.org/learn/back-end-development-and-...
You can always over optimise later on.
Postgres flexibility enables for design that is hard to scale. Both in terms of maintainability and performance. Enforcing K/V as a default database in one of my previous companies worked wonders.
* You know exactly what your app needs to do, up-front
But isn't this true of any database? Generally, adding a new index to a 50 million row table is a pain in most RDBs. As is adding a column, or in some cases, even deleting an index. These operations usually incur downtime, or some tricky table duplication with migration process that is rather compute + I/O intensive... and risky.
Ok, so what about MariaDB?
HOWEVER - this blog post is missing a critical point.... the quote should be:
---> Just use Postgres
AND
---> Just use SQL
"Program the machine" stop using abstractions, ORMs, libraries and layers.
Learn how to write SQL - or at least learn how to debug the very good SQL that ChatGPT writes.
Please, use all the very powerful features of Postgres - Full-Text Search, Hstore, Common Table Expressions (CTEs) with Recursive Queries, Window Functions, Foreign Data Wrappers (FDW), put JSON in, get JSON out, Array Data Type, Exclusion Constraints, Range Types, Partial Indexes, Materialized Views, Unlogged Tables, Generated Columns, Event Triggers, Parallel Queries, Query Rewriting with RULES, Logical Replication, PartialIndexes, Policy-Based Row-Level Security (RLS), Publication/Subscription for Logical Replication.
Push all your business logic into big long stored procedures/functions - don't be pulling the data back and munging it in some other language - make the database do the work!
All this stuff you get from programming the machine. Stop using that ORM/lib and write SQL.
EDIT:
People replying saying "only use generic SQL so you cans switch databases!" - to that I say - rubbish!
I nearly wrote a final sentence in the above saying "forget that old wives tale about the dangers of using a databases functionality because you'll need to switch databases in the future and then you'll be stuck!"
Because the reason people switch databases is when they switch to Postgres after finding some other thing didn't get the job done.
The old "tut tut, don't use the true power of a database because you'll need to switch to Oracle/MySQL/SQL server/MongoDB" - that just doesn't hold.
why does it even matter? I know that I need multimodal search in my product, and that is why I need vector DB. You're not saying anything interesting by saying "AI is a bubble". If you say something like I may not actually need RAG/mutimodal/semantic search/dedicated vector db then you may have my attention.
Just added it to my "Postgres Is Enough" gist: https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
Your data fits in ram[0]. [0]: https://yourdatafitsinram.net
Citus would be alright if the HA story was better: https://github.com/citusdata/citus/issues/7602
Recently I had the opportunity to rewrite an application from scratch in a new language. This was a career first for me and I won't go into the why aspect. Anyway, the v1 of the app used SQL and v2 was written against MongoDb. I planned the data access patterns based on knowledge that my DB was effectively document/key/value. The end result: it is much simpler. The v1 DB had like 100+ tables with lots of relations and needs lots of documentation. The v2 DB has like 10 "tables" (or whatever mongo calls them) yet does the same thing. Granted, I could have made 10 equivalent SQL tables as well but this would have defeated the purpose of using SQL in the first place. This isn't to say MongoDB is "better". If I had tons of fancy queries and relations I needed it would be easier with SQL, but for this particular app, it is a MUCH better choice.
TL;DR Don't default to anything, look at your requirements and make an intelligent choice.
Literally
If you are not storing much data no datase manager is th best
> AI is a bubble
Many say this but Generative AI and LLMs have gotten bunched up with everything else. There is a clear need for vectors and multimodal search. There is no core SQL statement to find concepts within an image for example. Machine learning models support that with arrays of numbers (i.e. vectors). pgvector adds vector storage and similarity search for Postgres. There was a recent post about storing vectors in SQLite (https://github.com/asg017/sqlite-vec).
> Even if your business is another AI grift, you probably only need to import openai.
There's much more than this. There are frameworks such as LangChain, LlamaIndex and txtai (disclaimer I'm the primary author of https://github.com/neuml/txtai) that handle generating embeddings locally or with APIs and storing them in databases such as Postgres.
PostgreSQL is good for many things and default to PostgreSQL and use something else if clearly justified is a sound advice, but assuming there is no room for anything else but PostgreSQL is not.
The claim about Datomic only working with JVM languages isn't right, it has a rest api there are eg python and js client libs using that.
It is certainly a higher performance solution in the fair comparison of a hermetically sealed VM using SQLite vs application server + Postgres instance + Ethernet cable. We're talking 3-4 orders of magnitude difference in latency. It's not even a contest.
There are also a lot of resilience strategies for SQLite that work so much better. For instance, you can just snapshot your VM in AWS every x minutes. This doesn't work for some businesses, but you can also use one of the log replication libraries (perhaps in combination with snapshots). If snapshots work for your business, it's the most trivial thing imaginable to configure and use. Hosted SQL solutions will never come close to this level of simplicity.
I personally got 4 banks to agree to the snapshot model with SQLite for a frontline application. Losing 15 minutes of state was not a big deal given that we've still not had any outages related to SQLite in the 8+ years we've been using it in prod.
Related
Just Use Postgres for Everything
The article promotes using Postgres extensively in tech stacks to simplify development, improve scalability, and reduce operational complexity. By replacing various technologies with Postgres, developers can enhance productivity, focus on customer value, and potentially cut costs.
Just Use Postgres for Everything
The blog post advocates for using PostgreSQL extensively in tech stacks to simplify development, improve productivity, and reduce complexity. It highlights benefits like scalability, efficiency, and cost-effectiveness, promoting a consolidated approach.
Gotchas with SQLite in Production
SQLite is popular for web apps but has limitations like single-threaded access, network challenges, and constraints in concurrency and transactions. Consider alternatives like MySQL for complex scenarios.
Is an All-in-One Database the Future?
Specialized databases are emerging to tackle complex data challenges, leading to intricate infrastructures. A universal, all-in-one database remains unfulfilled due to optimization issues and unique challenges of different database types.
SQLite vs. PostgreSQL
SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.