June 26th, 2024

PostgreSQL Statistics, Indexes, and Pareto Data Distributions

Close's Dialer system faced challenges due to data growth affecting performance. Adjusting PostgreSQL statistics targets and separating datasets improved performance. Tips include managing dead rows and optimizing indexes for efficient operation.

Read original article

PostgreSQL Statistics, Indexes, and Pareto Data Distributions

The article discusses the challenges faced by Close's Dialer system due to data growth impacting performance. Despite having indexes, queries were slow due to PostgreSQL statistics inaccurately reflecting null values. By adjusting statistics targets and analyzing data at appropriate times, performance improved. The article emphasizes separating hot and cold datasets to optimize database performance. Additional tips include managing dead rows and closing transactions during external calls to reduce database load. The Close team's journey highlights the importance of understanding PostgreSQL statistics, index optimization, and database schema design for efficient system operation.

Schema changes and the Postgres lock queue

Schema changes in Postgres can cause downtime due to locking issues. Tools like pgroll help manage migrations by handling lock acquisition failures, preventing application unavailability. Setting lock_timeout on DDL statements is crucial for smooth schema changes.

Lessons Learned from Scaling to Multi-Terabyte Datasets

Insights on scaling to multi-terabyte datasets, emphasizing algorithm evaluation before scaling. Tools like Joblib and GNU Parallel for single machine scaling, transitioning to multiple machines, and comparing performance/cost implications. Recommendations for parallel workloads and analytical tasks using AWS Batch, Dask, and Spark. Considerations for tool selection based on team size and workload.

Automatically setup pgAdmin with a Docker database

Developer experience is enhanced by configuring a PostgreSQL local environment with Docker Compose. Defining variables and leveraging folders streamlines setup. Integrating pgAdmin automates connections, improving efficiency and simplifying future deployments.

Postgres Schema Changes and Locking

Schema changes in Postgres can cause downtime by locking out reads and writes. Migration tools help mitigate issues. Breakages during migrations can affect client apps or lock database objects, leading to unavailability. Long queries with DDL statements can block operations. Setting lock_timeout on DDL statements can prevent queuing. Tools like pgroll offer backoff and retry strategies for lock acquisition failures. Understanding schema changes and DDL impact helps ensure smoother migrations and less downtime.

Post-mortem: Postgres out of shared memory error

An investigation into PostgreSQL's memory error highlighted insights on transaction isolation and lock management. Increasing max_locks_per_transaction can resolve the "out of shared memory" issue, crucial for maintaining data consistency and preventing memory errors.

3 comments

By @r34 - 10 months

Well, what came to my head pretty quickly when reading this was "there should be separate tables";) Thank you for the article, I didn't know about the statistic configuration in postgres. I can use your lesson straight away, as I'm desining a database for online gaming platform and now I'm sure that there should be separate tables for currently running and finished games:)

Also: very clear and easy to follow language, props to the author. Cheers.

By @wojcikstefan - 10 months

Hi HN community, author of the post here. Let me know please if you have any questions, comments, or feedback!

By @shinzui - 10 months

It’s amazing how many of those types of problems are easily avoided or solved with event sourcing and CQRS.

PostgreSQL Statistics, Indexes, and Pareto Data Distributions

Related

Schema changes and the Postgres lock queue

Lessons Learned from Scaling to Multi-Terabyte Datasets

Automatically setup pgAdmin with a Docker database

Postgres Schema Changes and Locking

Post-mortem: Postgres out of shared memory error

Related

Schema changes and the Postgres lock queue

Lessons Learned from Scaling to Multi-Terabyte Datasets

Automatically setup pgAdmin with a Docker database

Postgres Schema Changes and Locking

Post-mortem: Postgres out of shared memory error