UPDATED 2/6/2020
This has been fully fixed with PostgreSQL 12. It was such a big fix that the full fix was split between Postgres 11 and Postgres 12.
The Postgres Security team has decided not to call this a security bug, because if they did so, it would have to be back patched to all currently suported Postgres versions. I personally think it would be better to back patch this fix, but since they have not, this is a big reason to upgrade to Postgres 12.
UPDATED 07/30/2018
After the PostgreSQL security team looked at this issue, it was released to the Postgresql Hackers List on 7/19/2018. I would like to thank Michael Paquier who has since been writing patches to fix these issues. Michael has written back patches for all current versions of Postgresql, this means back to Postgresql 9.3.
Link to the Bug Report and Hackers followup:
-------------------
I was asked by Jan Wick, Jim Mlodgenski, Jim Nasby, Jeremy Schneider, and Denis Lussier to un-publish my original article and then re-submit my bug post to the security mailing list. They also mentioned that the security group may wish to remove bugs #15182 & 6183 from the archives until a patch has been created for these issues. Please note that I have been updating the original blog page and now this page as I have been finding new related issues.
UPDATED 5/3/2018 - TRUNCATE TABLE by unprivileged user.
Security Implications
Start of the Original Blog Post
Over the last several weeks our developers caused a Denial of Service Attack against ourselves by accident. When looking at the log files, I noticed that we had authentication timeouts during these time periods. In researching the problem I found this is due to locks being held on shared system catalog items, aka system catalog items that are shared between all databases on the same cluster/server. This can be caused by beginning a long running transaction that queries pg_stat_activity, pg_roles, pg_database, etc and then another connection that runs either a REINDEX DATABASE, REINDEX SYSTEM, or VACUUM FULL. This issue is of particular importance to database resellers who use the same cluster/server for multiple clients, as two clients can cause this issue to happen inadvertently or a single client can either cause it to happen maliciously or inadvertently. Note: The large cloud providers give each of their clients their own cluster/server so this will not affect across cloud clients but can affect an individual client. The problem is that traditional hosting companies will have all clients from one or more web servers share the same PostgreSQL cluster/server. This means that one or two clients could inadvertently stop all the other clients from being able to connect to their databases until the first client does either a COMMIT or ROLLBACK of their transaction which they could hold open for hours, which is what happened to us internally.
In this article I am going to cover how to reproduce this problem and then cover some ways to solve this problem.
Reproducing the Problem:
In Connection 1 we need to BEGIN a transaction and then query a shared system item; pg_authid, pg_database, etc; or a view that depends on a shared system item; pg_stat_activity, pg_roles, etc.
BEGIN; SELECT * FROM pg_stat_activity;
Optional: Using a database that is not used in Connections 1 through 3. Now lets look at the locks we have obtained via the query in Connection 1.
SELECT pg_locks.pid, pg_namespace.nspname AS schema_name, pg_class.relname AS object_name, pg_locks.mode, pg_locks.granted FROM pg_locks LEFT JOIN pg_class ON pg_locks.relation = pg_class.oid LEFT JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relname IS NOT NULL AND relisshared = true ORDER BY 1,2,3;
Locks Result - Results filtered to Connection 1 (18326) | ||||
---|---|---|---|---|
pid | schema_name | object_name | mode | granted |
18326 | pg_catalog | pg_database | AccessShareLock | True |
18326 | pg_catalog | pg_authid | AccessShareLock | True |
18326 | pg_catalog | pg_database_datname_index | AccessShareLock | True |
18326 | pg_catalog | pg_authid_rolname_index | AccessShareLock | True |
18326 | pg_catalog | pg_database_oid_index | AccessShareLock | True |
18326 | pg_catalog | pg_authid_oid_index | AccessShareLock | True |
We can see that Connection 1 was granted several AccessShareLocks's on shared system indexes. This will be a problem for Connection 2 that is going to REINDEX those indexes. Connection 2 can either be to the same database or a different database on the same cluster/server and can even be a different user BUT Connection 2 must be made by the database owner of the database being used or by a superuser. You can use either the REINDEX DATABASE or REINDEX SYSTEM command as either of these commands will cause the problem.
REINDEX DATABASE postgres;
OR
REINDEX SYSTEM postgres;
Optional: Now lets look at the locks we have obtained from both connections via this query.
SELECT pg_locks.pid, pg_namespace.nspname AS schema_name, pg_class.relname AS object_name, pg_locks.mode, pg_locks.granted FROM pg_locks LEFT JOIN pg_class ON pg_locks.relation = pg_class.oid LEFT JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relname IS NOT NULL AND relisshared = true ORDER BY 1,2,3;
Locks Result (Postgres 10.3) - Results filtered to Connections 1 (18326) & 2 (23227) | ||||
---|---|---|---|---|
pid | schema_name | object_name | mode | granted |
18326 | pg_catalog | pg_database | AccessShareLock | True |
18326 | pg_catalog | pg_authid | AccessShareLock | True |
18326 | pg_catalog | pg_database_datname_index | AccessShareLock | True |
18326 | pg_catalog | pg_authid_rolname_index | AccessShareLock | True |
18326 | pg_catalog | pg_database_oid_index | AccessShareLock | True |
18326 | pg_catalog | pg_authid_oid_index | AccessShareLock | True |
23227 | pg_catalog | pg_database | ShareLock | True |
23227 | pg_catalog | pg_database_datname_index | AccessExclusiveLock | False |
Note: In Postgres 9.4.12 the AccessExclusiveLock will be on held on pg_catalog.pg_authid_rolname_index.
Exclusive locks must wait for all previous Share locks to complete before the Exclusive lock will be granted. All future Share locks must wait for the Exclusive lock to complete. This means that Connection 2 has to wait for the AccessExclusiveLock that is currently locked by Connection 1.
Connection 3 can be to any database by any user on the cluster/server.
psql -h sqltest-alt -d sandbox
You will notice that you never receive your prompt to start entering your sql.
Optional: Now lets look at the locks we have obtained from all three connections via this query.
SELECT pg_locks.pid, pg_namespace.nspname AS schema_name, pg_class.relname AS object_name, pg_locks.mode, pg_locks.granted FROM pg_locks LEFT JOIN pg_class ON pg_locks.relation = pg_class.oid LEFT JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relname IS NOT NULL AND relisshared = true ORDER BY 1,2,3;
Locks Result (Postgres 10.3) - Results filtered to Connections 1 (18326), 2 (23227) & 3 (14846) | ||||
---|---|---|---|---|
pid | schema_name | object_name | mode | granted |
14846 | pg_catalog | pg_database | ShareLock | True |
14846 | pg_catalog | pg_database_datname_index | AccessShareLock | False |
18326 | pg_catalog | pg_database | AccessShareLock | True |
18326 | pg_catalog | pg_authid | AccessShareLock | True |
18326 | pg_catalog | pg_database_datname_index | AccessShareLock | True |
18326 | pg_catalog | pg_authid_rolname_index | AccessShareLock | True |
18326 | pg_catalog | pg_database_oid_index | AccessShareLock | True |
18326 | pg_catalog | pg_authid_oid_index | AccessShareLock | True |
23227 | pg_catalog | pg_database | ShareLock | True |
23227 | pg_catalog | pg_database_datname_index | AccessExclusiveLock | False |
Connection 3 has to wait for an AccessShareLock for the index locked by Connection 2 and Connection 2 is already waiting for Connection 1. Connection 3 may eventually get a "Canceling authentication due to timeout" in the log file after authentication_timeout seconds expires. psql, version 9.4.9, seems to ignore this timeout setting in my tests, but the setting does affects our applications and so we do have the "Canceling authentication due to timeout" in our log files. authentication_timeout is settable in your postgresql.conf file and has a default value of 60 with settable values from 1 to 600. This means that the default setting is 1 minute.
How to re-index without causing this problem
To bypass this problem, you can run this inline function that will only REINDEX user tables.
-- Loops through all user tables including temp and toast tables -- and reindexes all the indexes attached that said tables DO $$ DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM information_schema.tables WHERE table_type != 'VIEW' AND table_schema NOT IN ('pg_catalog', 'information_schema') LOOP EXECUTE 'REINDEX TABLE ' || quote_ident(r.table_schema) || '.' || quote_ident(r.table_name) || ';'; END LOOP; END $$;
Instead of the inline function, you can add this function to your database and then run a SELECT on the new function.
CREATE OR REPLACE FUNCTION tools.reindex_user_tables () RETURNS void AS $body$ -- Loops through all user tables including temp and toast tables -- and reindexes all the indexes attached that said tables DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM information_schema.tables WHERE table_type != 'VIEW' AND table_schema NOT IN ('pg_catalog', 'information_schema') LOOP EXECUTE 'REINDEX TABLE ' || quote_ident(r.table_schema) || '.' || quote_ident(r.table_name) || ';'; END LOOP; END; $body$ LANGUAGE 'plpgsql'; SELECT tools.reindex_user_tables();
The problem with the above function is that it does not re-index the system tables such as pg_class where all your index's, table's, etc are stored. This means that if you create and drop many real or temp tables, you can get bloat on the pg_class table and the indexes associates with the pg_class table. If you are actions like this, then you will want to be able to re-index the non-shared system tables. The following set of functions will re-index all indexes excluding non-shared system indexes. To run this DO function, you must be a superuser!
-- Loops through all indexes excluding any shared system indexes DO $$ DECLARE r RECORD; BEGIN FOR r IN SELECT pg_namespace.nspname AS index_schema, pg_class.relname AS index_name FROM pg_catalog.pg_class LEFT JOIN pg_catalog.pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE pg_class.relkind = 'i' AND pg_class.relisshared = false LOOP EXECUTE 'REINDEX INDEX ' || quote_ident(r.index_schema) || '.' || quote_ident(r.index_name) || ';'; END LOOP; END $$;
Instead of the inline function, you can add this function to your database and then run a SELECT on the new function. This function must be created by a superuser and then be granted to the database owner or PUBLIC, EXECUTE privilege on the function.
CREATE OR REPLACE FUNCTION tools.reindex_non_shared_indexes () RETURNS void AS $body$ -- Loops through all user tables including temp and toast tables -- and reindexes all the indexes attached that said tables DECLARE r RECORD; BEGIN FOR r IN SELECT pg_namespace.nspname AS index_schema, pg_class.relname AS index_name FROM pg_catalog.pg_class LEFT JOIN pg_catalog.pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE pg_class.relkind = 'i' AND pg_class.relisshared = false LOOP EXECUTE 'REINDEX INDEX ' || quote_ident(r.index_schema) || '.' || quote_ident(r.index_name) || ';'; END LOOP; END; $body$ LANGUAGE 'plpgsql' SECURITY DEFINER; SELECT tools.reindex_non_shared_indexes();
More Information
Any index on a shared system catalog table that has a share lock will cause other databases to not be able to obtain exclusive locks on the same index, but only indexes that are used by the connection code will stop/delay the connection from happening.
This version independent query will help you find all shared items in your database.
SELECT pg_class.oid, pg_namespace.nspname AS schema, pg_class.relname AS item, CASE pg_class.relkind WHEN 'r' THEN 'TABLE' WHEN 'i' THEN 'INDEX' WHEN 'S' THEN 'SEQUENCE' WHEN 'v' THEN 'VIEW' WHEN 'm' THEN 'MATERIALIZED VIEW' WHEN 'c' THEN 'COMPOSITE TYPE' WHEN 't' THEN 'TOAST TABLE' WHEN 'f' THEN 'FOREIGN TABLE' WHEN 'p' THEN 'PARTITIONED TABLE' END::text AS relation_kind FROM pg_class LEFT JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relisshared = true ORDER BY 1;
Query to Find All Shared Items Results - Postgres 10.3 | |||
---|---|---|---|
oid | schema | item | relation_kind |
1136 | pg_catalog | pg_pltemplate | TABLE |
1137 | pg_catalog | pg_pltemplate_name_index | INDEX |
1213 | pg_catalog | pg_tablespace | TABLE |
1214 | pg_catalog | pg_shdepend | TABLE |
1232 | pg_catalog | pg_shdepend_depender_index | INDEX |
1233 | pg_catalog | pg_shdepend_reference_index | INDEX |
1260 | pg_catalog | pg_authid | TABLE |
1261 | pg_catalog | pg_auth_members | TABLE |
1262 | pg_catalog | pg_database | TABLE |
2396 | pg_catalog | pg_shdescription | TABLE |
2397 | pg_catalog | pg_shdescription_o_c_index | INDEX |
2671 | pg_catalog | pg_database_datname_index | INDEX |
2672 | pg_catalog | pg_database_oid_index | INDEX |
2676 | pg_catalog | pg_authid_rolname_index | INDEX |
2677 | pg_catalog | pg_authid_oid_index | INDEX |
2694 | pg_catalog | pg_auth_members_role_member_index | INDEX |
2695 | pg_catalog | pg_auth_members_member_role_index | INDEX |
2697 | pg_catalog | pg_tablespace_oid_index | INDEX |
2698 | pg_catalog | pg_tablespace_spcname_index | INDEX |
2846 | pg_toast | pg_toast_2396 | TOAST TABLE |
2847 | pg_toast | pg_toast_2396_index | INDEX |
2964 | pg_catalog | pg_db_role_setting | TABLE |
2965 | pg_catalog | pg_db_role_setting_databaseid_rol_index | INDEX |
2966 | pg_toast | pg_toast_2396 | TOAST TABLE |
2847 | pg_toast | pg_toast_2396_index | INDEX |
2964 | pg_catalog | pg_db_role_setting | TABLE |
2965 | pg_catalog | pg_db_role_setting_databaseid_rol_index | INDEX |
2966 | pg_toast | pg_toast_2964 | TOAST TABLE |
2967 | pg_toast | pg_toast_2964_index | INDEX |
3592 | pg_catalog | pg_shseclabel | TABLE |
3593 | pg_catalog | pg_shseclabel_object_index | INDEX |
4060 | pg_toast | pg_toast_3592 | TOAST TABLE |
4061 | pg_toast | pg_toast_3592_index | INDEX |
6000 | pg_catalog | pg_replication_origin | TABLE |
6001 | pg_catalog | pg_replication_origin_roiident_index | INDEX |
6002 | pg_catalog | pg_replication_origin_roname_index | INDEX |
6100 | pg_catalog | pg_subscription | TABLE |
6114 | pg_catalog | pg_subscription_oid_index | INDEX |
6115 | pg_catalog | pg_subscription_subname_index | INDEX |
Note: There can also be shared catalog items in the pg_toast schema if any row of data is larger than 8K. This is why the above query outputs the schema location for each shared catalog item.
Additional Issues
While this problem can be caused by the REDINDEX command, it is also possible to cause this same issue with the VACUUM FULL command.
VACUUM FULL pg_authid;
vacuumdb -f -h sqltest-alt -d lloyd -U lalbin
I have not yet tested the pg_repack command, but I suspect pg_repack will have the same issues when you use the -f, FULL, flag.
CREATE EXTENSION pg_repack;
pg_repack -f -h sqltest-alt -d lloyd -t pg_authid; pg_repack -f -h sqltest-alt -d lloyd -t pg_catalog;
psql Issue:
psql does not honor the authentication_timeout in some sense due to the fact that psql will hang forever without returning an error even though the connection appears to be authorized in the second line of the connection log. I have let psql run overnight and it stayed hung the entire time without completely logging in or giving an error message. This affects the psql client for Postgres 10, Postgres 9.4 will get the timeout properly.
2018-04-30 22:41:51 PDT [8828]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=1.1.1.1 port=23829 2018-04-30 22:41:51 PDT [8828]: [2-1] user=lalbin,db=lloyd LOG: connection authorized: user=lalbin database=lloyd 2018-04-30 22:41:52 PDT [8828]: [3-1] user=lalbin,db=lloyd LOG: process 8828 still waiting for AccessShareLock on relation 2671 of database 0 after 1000.243 ms 2018-04-30 22:41:52 PDT [8828]: [4-1] user=lalbin,db=lloyd DETAIL: Process holding the lock: 8088. Wait queue: 8434, 8496, 8497, 8498, 24769, 8553, 8828.
2018-05-01 10:54:54 PDT [8379]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=10.6.164.66 port=38997 2018-05-01 10:54:55 PDT [8379]: [2-1] user=postgres,db=postgres LOG: process 8379 still waiting for AccessShareLock on relation 2676 of database 0 after 1000.309 ms 2018-05-01 10:54:55 PDT [8379]: [3-1] user=postgres,db=postgres DETAIL: Process holding the lock: 8176. Wait queue: 8284, 8316, 8379. 2018-05-01 10:55:54 PDT [8379]: [4-1] user=postgres,db=postgres FATAL: canceling authentication due to timeout
Big Additional Issue - VACUUM FULL:
If Connection 2 is by a non-superuser, non-database owner and they do a VACUUM FULL pg_authid; for which they do not have permission to perform, they can still cause the blocking of Connection 3 as the WARNING about no permissions is only given after all the AccessShareLock's by Connection 1 have been released. If you try to perform REINDEX TABLE pg_authid; or REINDEX INDEX pg_authid_rolname_index; as a non superuser, you will get an error without it waiting for the Exclusive lock to be obtained.
Big Additional Issue - TRUNCATE TABLE:
If Connection 2 is by a non-superuser, non-database owner and they do a TRUNCATE TABLE pg_authid; for which they do not have permission to perform, they can still cause the blocking of Connection 3 as the WARNING about no permissions is only given after all the AccessShareLock's by Connection 1 have been released. Tested with PostgreSQL 9.4.12, 9.6.6, and 10.3.
Patching PostgreSQL Ideas
We have bounced several ideas around work about how the PostgreSQL Community should fix this issue.
For DBA's: What do I do if this happen to our server?
First and foremost turn on "log_lock_waits = 'on'" in your postgresql.conf or postgresql.auto.conf file. This will allow you to see these locking/blocking problems. Then when you find your server in this situation, you should be able to find the following inside your log file.
2018-05-01 10:54:32 PDT [8284]: [3-1] user=postgres,db=postgres LOG: process 8284 still waiting for AccessExclusiveLock on relation 2676 of database 0 after 1000.345 ms 2018-05-01 10:54:32 PDT [8284]: [4-1] user=postgres,db=postgres DETAIL: Process holding the lock: 8176. Wait queue: 8284. 2018-05-01 10:54:32 PDT [8284]: [5-1] user=postgres,db=postgres STATEMENT: REINDEX DATABASE postgres;
If you have a preexisting open connection to the server your can run the following command using the pid number in the [] from the log entry above inside the pg_terminate_backend function below.
SELECT pg_terminate_backend(8284);
This will have fixed your problem, unless you don't have a preexisting open connection and then the problem becomes what to do. If you can SSH into the server, you can perform a "kill pid" for that thread. You should then contact the user in the log file who performed the REINDEX and have them run the DO command that I showed above that will not block connections.
When writing applications, it is recommended to not connect as the database owner. Once VACUUM FULL and TRUNCATE TABLE are fixed, then SQL Injection of this these problems will be preventable from being caused by a totally un-privilaged user such as a web form interface.
References
The oldest reference to this I can find on the Internet, relates back to 2011 caused by a problem with vacuumdb and the pg_authid table. There was a patch in 2011 by Tom Lane that should have fixed this for VACUUM FULL but I am not sure if it is working properly or has been accidentally undone since then. The REINDEX was not talked about until the 2017 email and no one that I know of has ever talked about what happens if an unprivileged user tries to vacuum the pg_authid table being able to cause this same locking issue.
Conclusion
I have testing this against Postgres 9.4.12 and 10.3 and from my testing I suspect that this affects all versions of PostgreSQL. Because two unsuspecting people can cause all other connections to be block, DOS Attacked, using valid SQL command, I think that this is a very big issue and needs to be dealt with right away and so I have filed a new PostgreSQL bug report on 4/30/2018 as Bug #15182.
Thank You's
SELECT pg_roles - Tobin Stelling - Fred Hutch - Developer #1 causing the in house DOS Attack
REINDEX DATABASE - David Costanzo - Fred Hutch - Developer #2 causing the in house DOS Attack
TRUNCATE TABLE - Jeremy Schneider - Amazon AWS - PostgreSQL Engineer
Lloyd Albin
Database Administrator
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Fred Hutchinson Cancer Research Center (FHCRC / Fred Hutch)