June 27th, 2024

Using short lived Postgres servers for testing

Database servers like PostgreSQL can be quickly set up for short-lived environments or CI/CD pipelines by creating new data directories and using pg_basebackup for efficient data population. This method simplifies testing and demo setups.

Read original article

Using short lived Postgres servers for testing

Database servers are typically long-lived and crucial components of infrastructure. However, for scenarios like short-lived test environments or CI/CD pipelines, setting up a new PostgreSQL instance can be simplified. By creating a new data directory, initializing it, and starting PostgreSQL within it, multiple instances can run on the same server without the need for Docker. To populate instances with data, cloning from a stable instance using pg_basebackup is efficient. This method creates exact copies with users, databases, and configurations. In Kubernetes, dynamic PostgreSQL instances can be easily set up in pods using similar techniques. While pg_basebackup may not work across different PostgreSQL versions, it offers a quick way to establish databases for testing. This approach is particularly useful for creating temporary environments for manual testing or demo purposes. Additionally, the author shares how this method was implemented in the eugene trace command, simplifying the setup process for users.

Schema changes and the Postgres lock queue

Schema changes in Postgres can cause downtime due to locking issues. Tools like pgroll help manage migrations by handling lock acquisition failures, preventing application unavailability. Setting lock_timeout on DDL statements is crucial for smooth schema changes.

Automatically setup pgAdmin with a Docker database

Developer experience is enhanced by configuring a PostgreSQL local environment with Docker Compose. Defining variables and leveraging folders streamlines setup. Integrating pgAdmin automates connections, improving efficiency and simplifying future deployments.

Postgres Schema Changes and Locking

Schema changes in Postgres can cause downtime by locking out reads and writes. Migration tools help mitigate issues. Breakages during migrations can affect client apps or lock database objects, leading to unavailability. Long queries with DDL statements can block operations. Setting lock_timeout on DDL statements can prevent queuing. Tools like pgroll offer backoff and retry strategies for lock acquisition failures. Understanding schema changes and DDL impact helps ensure smoother migrations and less downtime.

Supabase (YC S20) Is Hiring Postgres SREs

Supabase seeks a Site Reliability Engineer to manage Postgres databases remotely. Responsibilities include enhancing reliability, ensuring high availability, and optimizing performance. Ideal candidates possess multi-tenant database experience, Postgres tools proficiency, and AWS deployment skills. Benefits include remote work, equity, health coverage, and tech allowance.

PostgreSQL Statistics, Indexes, and Pareto Data Distributions

Close's Dialer system faced challenges due to data growth affecting performance. Adjusting PostgreSQL statistics targets and separating datasets improved performance. Tips include managing dead rows and optimizing indexes for efficient operation.

10 comments

By @whalesalad - 10 months

Echoing the other comment here, https://testcontainers.com/ is a great tool for this. (Or just vanilla docker in general)

In our core makefile we have `make pytest` to run standalone unit tests, and then `make pytest_db` to run full database tests using a docker container. The test suite fires up a completely fresh DB, runs migrations against it, and then the test suite proceeds as usual. On a per-module basis a simple fixture is used to determine if the db should get reset for each test, each file, or the entire module.

Works great on Github actions, too.

When the test is done, the container and all noise is automatically cleaned up. No state left behind.

By @boustrophedon - 10 months

Related self-promotion: I built pgtemp (https://github.com/boustrophedon/pgtemp) to automate doing exactly these mkdir/initdb/load/destroy steps.

By @suralind - 10 months

I started to use testcontainers for it, however my biggest problem is that each test needs to start its own container which is slow (right now I have about 20 test cases in my PoC).

testcontainers allows to stop and start containers and to reset their state to a snapshot, but Go implementation seems to have a bug which I reported recently [1]. It's going to be huge when they fix it, as it will make tests much faster.

[1]: https://github.com/testcontainers/testcontainers-go/issues/2...

By @sakras - 10 months

Disclaimer: I work at Neon

Neon makes this very ergonomic in a cloud setting (actually it works somewhat similarly to how the article does it). You can create a copy-on-write version of your prod database (a branch) and use it for testing. There's also some automation we've built up for doing it from GitHub CI: https://neon.tech/docs/guides/neon-github-app

By @treybrick - 10 months

Disclaimer: I work at Tonic AI

We have a product called Ephemeral (in some ways similar to Neon) that orchestrates a kubernetes cluster to spin up and down ephemeral databases from "snapshots" for this exact use case.

We have a more well established product called Structural that does de-identification and anonymization from production data, and we have a pretty clean integration between the two products to make creating your fleet of ephemeral testing databases nice and easy.

In case you're interested -> https://www.tonic.ai/ephemeral

By @iteratorx - 10 months

You can use Bunnyshell Ephemeral environments for this (bunnyshell.com). - The Postgres instances will be deployed in Kubernetes (bring your own infra if you want), - you can configure seeding, - and also you can run any other containers next to Postgres (for example E2E tests). - You can trigger the process manually or automatically (PR, GH actions, cli tool, API, etc).

For example, the ArgoCD project uses Bunnyshell to create ephemeral envs on each PR. https://github.com/argoproj/argo-cd/pull/18851

Disclaimer: I work for Bunnyshell

By @_DeadFred_ - 10 months

Needs a reminder warning that you need scripts to massage your data in any test database if you have compliance requirements (HIPAA, PCI, customer confidentiality agreements, etc). It's always the worst having all kinds of restrictions that limit productivity in order to meet compliance only to get a self own by a compliance scan picking up a random test database hanging out somewhere.

By @twh270 - 10 months

If you work in the JVM ecosystem, I can recommend (Java/Kotlin) + Quarkus + JDBI with Postgres. Quarkus supports testcontainers out of the box for tests, and it just plain works.

I don't have experience with it, but some people also recommend jooq.

By @esafak - 10 months

I'd eventually like to use dagger.io so I can use the same technology in testing, CI, and CD, but it is not mature enough yet, so I'm using testcontainers for the testing part, with success.

By @ludamn - 10 months

For those interested in a easier way to set up databases, and maybe other containerized services, for testing I recommend using TestContainers[1], there's integrations for well known languages like Go, Java, Node.js and the set up is easier than what is described in the article (at least if you're already developing on a container-ready workspace)

[1]: https://testcontainers.com/