Why Monitor PostgreSQL Database Health?
Just like monitoring our applications, it is very much necessary to monitor our database system. We want to monitor things substantially more granular than system-level processes—things such as:
- How many queries actually make use of the index?
- How effective is the database cache?
- The number of open connections.
And so on.
Hardware metrics are essential, but can only get us so far, and will not help us make concrete decisions—like which queries to optimize and how we are using our database in general. Since PostgreSQL is an open-source database, it offers a lot of insights for free without having to rely on costly tools. Most of the metrics we are going to track are available by default since they are required for the query planner, but there are other critical areas for which we have to enable certain extensions/contrib modules that ship along with PostgreSQL.
Introduction to the Arctype Tool
Arctype is a very neat database tool that can be used not just as an SQL editor but also as a platform upon which complex dashboards can be both custom-built and shared around with developers/users. To understand its true power, we are going to consider a practical use case—monitoring a PostgreSQL database health.
Getting Started with Arctype Dashboards
To get ourselves familiar with the dashboard capabilities we are going to build a very simple dashboard with just three components. The Arctype dashboard is built right into the Arctype tool itself. To access the functionality, just click on Dashboards
button as shown below.
Click on the plus icon and create a dashboard named - Postgres Health Monitoring
.
Now that the dashboard is created, let's create some components.
Creating a List of PostgreSQL Tables from Information_Schema
Adding a table component/chart component is covered in one of our previous articles. Drag and drop the Table
into the empty dashboard pane. Next, we will use a simple query that talks to the underlying database (PostgreSQL) and gets all of the tables with their schemas present excluding the information_schema
and the system tables.
SELECT
table_schema, table_name
FROM
information_schema.tables
WHERE
table_schema <> 'information_schema'
AND table_name NOT LIKE 'pg_%'
ORDER BY
table_schema,
table_name;
Visualizing Row Counts of Tables in a Database Schema
Let's add a Chart component (similar to table) next to our table component which uses the below query:
SELECT
relname as table_name,
reltuples as rows
FROM
pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema')
AND relkind = 'r'
ORDER BY
reltuples DESC;
This query simply counts the records for all the tables in our databases. In the right pane, select table_name
for X-Axis, and rows
for Y-Axis.
Adding Images to an Arctype Dashboard
After adding the table and chart, let's add an image to our dashboard from the PostgreSQL website along with a text field for explanation. After adding all of the above steps, our dashboard will look like this:
At this point, you should have a decent feel for dashboards, so now, let's build a real health dashboard and see how it can help us visualize our database health.
Identifying Critical Areas in the PostgreSQL Database
PostgreSQL has a lot of areas upon which we can focus in order to understand the database's health. In the below sections, we will see a handful of them and demonstrate how you can use Arctype to visualize them.
Querying to Check PostgreSQL Open Connections
It is always a good idea to monitor how many open connections are currently present in our database cluster. We can get this from the pg_stat_activity
table:
SELECT
COUNT(*) as connections,
backend_type
FROM
pg_stat_activity
GROUP BY
backend_type
ORDER BY
connections DESC
Determining the Size of a PostgreSQL Database
As our database expands over time, it is important to keep the database size in check. We can get a list of databases and their respective sizes using the below query:
SELECT
datname as database_name,
pg_database_size(datname)/1024/1024 as size
FROM
pg_database
WHERE
datistemplate = false;
We could use the pg_size_pretty
instead of calculating the MB directly, but we want to plot this as a bar chart to see and compare the sizes. Keeping tabs on individual database sizes helps us to get an overall picture.
Note: These sizes can vary from disk size since PostgreSQL does a lot of compression behind the scenes.
Calculating Database Cache-Hit Ratios in PostgreSQ
Any relational database/operational database serves most of its traffic from its cache. If that's not happening, we definitely need to look into it. The PostgreSQL cache area is known as shared_buffers
, which acts as the database layer cache on top of the cache provided by the operating system. We can use the below query to understand the cache hit ratio:
SELECT
sum(heap_blks_read) as reads,
sum(heap_blks_hit) as hits,
ROUND(
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)),
4
) as hit_ratio
FROM
pg_statio_user_tables;
From the query plan blog, there are a couple of tuples that are coming from heap read and heap hits. The pg_statio_user_tables
has the information regarding this. In the above query, we are simply getting the values of hits and reads (coming from the disk) and calculating the ratio rounded off to four digits. It is important to note that in a healthy production database, this ratio should come to around 97+ or close to it—97% of the hits are coming from the cache.
Scanning for Unused Indexes
The statistics collector in PostgreSQL records which indexes were accessed how frequently. We can construct a query from one of the statistics tables called pg_stat_all_indexes
.
SELECT
COUNT(*) as count,
relname as table_name
FROM
pg_stat_all_indexes
WHERE
idx_scan = 0
and schemaname = 'public'
GROUP BY
table_name
This kind of gives us an indication of tables that have the greatest number of unused indexes. The idx_scan=0
gives us the indexes that were never used even once. We can customize this value depending on our database traffic pattern.
Inspecting PostgreSQL Database Caches
In order to get more insight out of our cache, we need to enable another contrib module called pg_buffercache
. After enabling it and running a few queries we can now inspect the cache:
SELECT
c .relname AS entity_name,
count(*) AS buffers
FROM
pg_buffercache b
INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c .oid)
AND b.reldatabase IN (
0,
(
SELECT
oid
FROM
pg_database
WHERE
datname = current_database()
)
)
WHERE
c .relname NOT LIKE 'pg_%'
GROUP BY
c .relname
ORDER BY
2 DESC;
This query was taken from the official documentation and then modified a little bit. It shows how many pages in shared_buffers
are occupied by different tables and indexes. You can assume pages as a metric directly proportional to the number of rows—the larger table being queried, the more pages get moved to the cache. This can also mean that some tables can be moved to an application cache or a fast key-value store, such as Redis, for faster querying—and to free up the database cache for indexes.
Monitoring Average Postgres Query Performance
For this part, we need to enable the pg_stat_statements
extension. It is a built-in extension/contrib module, and in some cloud providers, it is enabled by default. If not, it can be easily enabled using the shared_preload_libraries
. This extension helps us capture a ton of information regarding query performance. Using this extension, we are going to build a query that outputs the query performance based on the number of rows the query has scanned:
SELECT
mean_time as "timing in ms",
rows
FROM
pg_stat_statements
ORDER BY
mean_time DESC
LIMIT
100;
Limiting the rows and doing the order by
gives us the top 100 most consumed queries and the rows
column indicates the total number of rows retrieved or affected by the statement. We can also get the top 10 worst queries by running time using the above table. This can be used as an alternative for slow log analysis, as this is more real-time.
Building a Dashboard with PostgreSQL Metrics
If we combine all of the queries in the mentioned chart types, it should look like a proper analytics dashboard that can be easily visualized/consumed:
The mapping for each of the components is shown in the table below. Arctype is not limited to the below charts and has an incredibly feature-rich platform upon which we can represent our visualization.
Module | Type |
Open Connections | Pie Chart |
Cache Hit Ratio | Normal Table |
Unused Indexes | Bar Chart |
Database Size | Bar Chart |
Shared Buffers | Bar Chart |
Query Response Times | Scatter Chart |
Each widget here has its own description and the graphs have details on what is present in X-Axis and Y-Axis. It is important that the dashboard is self-explanatory, after all, a picture is worth a thousand words. These charts might look different depending upon the underlying data. They can be expanded and reduced by simply dragging their borders while in edit mode, which can be very useful if the dataset is larger—charts can overflow.
The above dashboard was created in an example database that I had used before. Let's create the dashboard over a more realistic database such as the Covid DB
one that comes inbuilt with Arctype (available during the Signup flow). Since the dataset is huge, I have created the dashboard in two parts:
As we have more data points, the dashboard looks a lot richer and more meaningful. The goal is not to be a "silver bullet" solution, but rather to provide some perspective as to potential areas of focus for an operational database. There are a lot of tools out there that offer different functionalities, but at the end of the day, I believe database developers should know their databases in more depth. What's better than a tool that easily lets us translate queries into really nice charts and graphs with almost no learning overhead? Answer: probably not much.
Insights and Intelligence from the Health Dashboard
This dashboard allows us to monitor the database in real-time and helps us to more fully understand some features which are unique to PostgreSQL. In doing database optimizations, it is always a rule that "you cannot improve what you cannot measure," and this dashboard solves the first (yet often overlooked) step, which is actually measuring things. The Covid DB
demo database has a 99% cache hit rate, which means that almost all of the traffic is served from the cache (shared_buffers) and will respond very quickly. This is one good sign of a well-optimized database.
Monitoring our database during peak traffic times will give us a lot of insight into which areas can be improved and how to measure this improvement. Some of the ways in which we can improve database performance based upon these metrics are:
- Using a client-side/server-side connection pool.
- Code optimizations to release connections back to the pool/close the connection once the work is done.
- Doing proper database modeling and splitting the traffic between different clusters.
- Increasing the size of the shared buffers to accommodate more data/indexes in memory.
- Optimizing poorly written queries by either adding an index or re-writing them entirely.
- Removing unused indexes to speed up writes.
Whatever we have seen so far is only the tip of the iceberg. Database maintenance and improvement is an art of its own, and one which truly takes several years of hands-on experience. But is always important to use a toolchain that is friendly, cost-effective, and can help us achieve our goal easily.
Keeping the Dashboard Up-to-Date
After building it, it is important to keep the dashboard up-to-date. This can be done using various ways depending on our needs.
Manually Refreshing Arctype Dashboards
At the top right beside the Edit Mode button, there is a reload button, which we can click to reload the entire dashboard.
Automatically Refreshing the Entire Dashboard
We can also configure the whole dashboard to be refreshed periodically.
Refreshing Individual Components Automatically
By clicking on an individual component and clicking on Advanced
, we can configure the component in question to be refreshed at specified and automated intervals.
We can choose what kind of refresh type we want, but keep in mind that the queries to monitor the database itself may be resource-intensive and can therefore degrade performance.
Sharing the Dashboard (and Closing Thoughts)
Another very important feature that is missing amongst many database tools, but one that is present in Arctype, is the ability to share dashboards.
By clicking the Share
button on the right top screen in the dashboard viewer, we get various options to share.
Sharing an Arctype Workspace via Email
This would invite the user to be part of the workspace. For more advanced controls, check out the teams functionality.
Linking to Your Workspace
This is the same as above, except that anyone with the link can join the workspace.
Creating a Public Read-Only Link
This will send a read-only link that can be viewed in the browser as well. This can be very handy for sharing dashboards with different (perhaps non-technical) stakeholders on a team.
Note: You cannot share a locally-hosted database (or any affiliated dashboards)
Conclusion
These dashboards are incredibly useful when:
- Running a performance test.
- Monitoring a live database for a specified period.
- General database monitoring.
Fine-grained control provided by Arctype makes this dashboard easily shareable across colleagues, teams, and even to the general internet. Go ahead and download Arctype if you haven't already, and start creating these dashboards—knowing database health and monitoring them should no longer be limited to certain tools or people!