Tuning Postgres Configurations

Recently I published my dating app on Google Play Store. And shared it some of my friends. Actually a lot! Then someone pinged me and said that she’s getting internal server error during signup.

BTW, here’s the link if you want to find a dating partner https://kippy.app/ 😋.

Ok, so I checked the logs and I was getting this error.

panic: pq: sorry, too many clients already

So let’s discuss some of the PostgreSQL configurations and their optimal values. For tuning these parameters you need to play with postgres.conf file. You can usually find it at /etc/postgresql/<version>/main/postgresql.conf OR just run the following command in your Postgres console.

$ psql -U postgres -c 'SHOW config_file'

Let’s get started

max_connections

This was the root cause of the above error. As the parameter name suggests, it is the maximum number of client connections allowed. Its default value is 100. As soon as the number of connections hit this value, the above exception is thrown.

Obviously, the higher the number of max_connections , more active + idle connections your system can handle and more system resources will be used and more database operations you can perform in a given time.

shared_buffers

Merely increasing max_connections may not help. You’ll also need to increase the shared_buffers space. It specifies how much memory is dedicated to PostgreSQL to use for caching data. It is recommended to keep it around 25% of your system memory. So, if you have 1GB of RAM you should have shared_buffers=256MB .

effective_cache_size

It is the number of disk pages in Postgres + OS-level disk cache ie, the total number of buffer pages available for the whole query. The PostgreSQL query planner uses this value to determine if the query execution plan will fit in the memory or not? Based on this the way indexing is used may vary. A higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. This memory is used for estimation purpose only and not the actual query. CMIIW 😅.

Source: https://www.cybertec-postgresql.com/

Read more about effective_cache_size here — https://www.cybertec-postgresql.com/en/effective_cache_size-better-set-it-right/

maintenance_work_mem

It is the maximum memory used for Postgres maintenance tasks which include, VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY etc. VACUUM reclaims storage occupied by dead tuples.

It is recommended to set it as 10% of system memory and increase it if you are facing problems in VACCUM operation.

wal_buffers

The amount of shared memory used for WAL (Write-Ahead Logging) data that has not yet been written to disk. Generally, it is about 3% or 1/32 of the shared_buffers value. Though you may want to increase it if you have too many concurrent connections.

effective_io_concurrency

Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. The allowed range is 1 to 1000, or zero to disable issuance of asynchronous I/O requests. But of course, a higher value of concurrency will use up more resources but it will also make your queries faster.

Source: https://portavita.github.io/

seq_page_cost

Sets the planner’s estimate of the cost of a disk page fetch that is part of a series of sequential fetches. The default is 1.0.

random_page_cost

Sets the planner’s estimate of the cost of a non-sequentially-fetched (random) disk page. The default is 4.0.

Generally, for an HDD, random access is more expensive than a sequential search. Around 4X more expensive. But it’s faster also.

So, whichever out of seq_page_cost and random_page_cost is lower will be preferred for disk access. By default, Postgres is optimised for the HDD so it prefers sequential access over random access. But we know very well why our RAM is faster than the hard disk.

I’m sharing a link to an article where the person was using SSD in his system and was able to make his queries 50X faster by tuning random_page_cost=1

work_mem

It specifies the amount of memory to be used by internal sort operations like ORDER BY and DISTINCT etc. So, if you running complex queries and have a lot of memory, you would prefer to do these sorts in-memory rather than on disk. So, increase the value.

Also, keep in mind the multiple sorting operations will be using the same amount of memory so the actual memory being used will be much more than what you set for work_mem . For example, set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory.

Source: https://severalnines.com/

That was all I could discover and understand. If you think there are any other Postgres config parameters that we can tweak for faster queries, please put down in the responses.

References

--

--

--

Product Engineer @ Gojek | Founder at The Algorithms | GSoC’18 @ FOSSASIA | And lots of Music & Football

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

10 Smooth Python Tricks For Python Gods

Building Data Vault modeling capability through the Mob

Impress is now stocking the Gold Series of Hard Drives from WD.

(Git_6) Historical Commits — Insert, Reorder, Delete; Label

Single, Double, and Triple Quotes in Python

Connecting a Spring Boot Microservice with Spring Cloud Config Server

from Buffer

Using UIViewController in SwiftUI (UIViewControllerRepresentable)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anup Kumar Panwar

Anup Kumar Panwar

Product Engineer @ Gojek | Founder at The Algorithms | GSoC’18 @ FOSSASIA | And lots of Music & Football

More from Medium

Getting started with PostgreSQL as metastore for Airflow

Creating Custom AMIs for use in AWS SageMaker Studio

Form Fits Function: A Case Study on Integrating MongoDB and Redis with Apache Spark

Streaming MongoExport to Blob Store (S3)