Slow Queries and the AutoVacuum/Analyze System

I was recently asked by another PostgreSQL DBA about what could be causing queries to run slow while their application is running but then run fast when the application is stopped. While I have given several presentations on this subject, each one has been on a specific cause and not talking about all of them together. I decided I would compile them into a list on this page.

These slow queries are reported to me by my developers when they start taking minutes to hours to days, in their test and/or production databases. When they stop the application pr load up a fresh copy of the database, the same queries run in milliseconds. I have found that this is caused by the AutoVacuum/Analyze system or lack there of.

Here is a list of some of the symptoms that you could be seeing.

  • Small table that gets thousands of rows added and deleted. This table is then joined to a large table to view a set of results.
    • Effect: Your queries get slower and slower until a query that was 10 ms now takes 5 seconds or more, but if you start and stop your application, then the problem self fixes itself.
    • Caused by the AutoAnalyze not updating the stats. This causes the query planner to run slowly and to pick the wrong query plan which causes the query to perform in minutes vers milliseconds. A manual Analyze will fix this issue. This issue can be verified by checking the row count vers the estimated table rows in the stats and/or by monitoring the query planing time.
  • Large table with tens of millions of rows where you are deleting and adding millions of rows per hour.
    • Effect: You find that your queries are slowing down and when you look at your AutoVacuum it seems to be AutoVacuum this table 24 hours per day, but if you stop your application for a few day or dump and restore your database it fixes itself.
    • Caused by not enough time for the AutoVacuum to complete before the next churn cycle. This make the AutoVacuum take longer and longer to complete with the table bloat getting worse and worse until the AutoVacuum takes over 24 hours to complete. Stopping the Application lets the AutoVacuum catch up and then the queries run fast. This can be fixed by either stopping the churn and doing a manual vacuum/analyze or by increasing your AutoVacuum's I/O rate by increasing the AutoVacuum Cost Limit. This can be verified by watching the currently running AutoVacuum's and the estimated dirty rows in the table stats.
  • Large table with tens of millions of rows where you are deleting and then adding millions of rows per hour with the same index values inside of a transaction.
    • Effect: Your queries get slower and slower, but if you dump and restore your database, everything is fast again.
    • Caused by the deleted records needing to be in the index at the same time as the new records. This means that all the index pages need to be split causing the index to double in size (or more) to allow the new entries to be inserted. The space is not recovered by the AutoVacuum system because it does not do a true vacuum of the index. The AutoVacuum will only vacuum/delete an index page which is 100% empty. The AutoVacuum does not do anything with partially filled pages except to remove deleted index entries. This can be verified by either looking for index bloat or by replacing the index and the query runs fast again.

You can learn more about these issues by viewing my presentations on the AutoVacuum System and on Index Bloat