I have found an edge case where pg_dump and pg_restore fails to restore databases with check constraints that points to data in another table that is restored after the table containing the check constraint. This has been tested and found to be a problem on PostgreSQL 9.5.2, 10.3 and 10.4.
In this how to, we will go over the following information:
Creating a database with this problem:
Let setup create our test database and then log into that test database.
# Create our source database createdb test_db_1 # Start psql to create the data within the database psql -d test_db_1
Now we need to create two schemas, with one table each and also create a function that can be used by a check constraint to check the contents of one table's field against the other table. The key in causing this failure is to place your key table in a schema alphabetically before your the schema where you use the data. This check constraint acts like a multi foreign key constraint but with a one field constrained by two fields.
-- Create the schema for the lookup table CREATE SCHEMA schema_b; -- Create the lookup table CREATE TABLE schema_b.table_b ( type VARCHAR NOT NULL, "desc" VARCHAR NOT NULL, CONSTRAINT table_b_pk PRIMARY KEY(type, "desc") ); -- Create the schema for our main table CREATE SCHEMA schema_a; -- Create the check constraint function CREATE OR REPLACE FUNCTION schema_b.check_lu_table_b (varchar, varchar) RETURNS boolean AS' SELECT CASE WHEN b."desc" IS NULL THEN FALSE ELSE TRUE END FROM (SELECT $2 AS "desc", $1 AS type) a LEFT JOIN schema_b.table_b b USING (type, "desc") 'LANGUAGE 'sql' VOLATILE RETURNS NULL ON NULL INPUT; -- Create the main table, using the check constraint function CREATE TABLE schema_a.table_a ( tp_id SERIAL, billing_acct_type VARCHAR(20), CONSTRAINT "table_a_pk" PRIMARY KEY(tp_id), CONSTRAINT "table_a_chk_billing_acct_type" CHECK (schema_b.check_lu_table_b('BillAcctType', billing_acct_type)) ); -- Insert the test data INSERT INTO schema_b.table_b VALUES ('BillAcctType', 'Invoice'); INSERT INTO schema_a.table_a (billing_acct_type) VALUES ('Invoice');
Backing up the database
We now can perform a normal backup of our database.
#Backup the test_db_1 database to test the restoration pg_dump -Fc test_db_1 -f test_db_1.pg_dump
Restoring the database
When we restore this database we will run into problems because the data does not exist in table_b at the time the data is being loaded into table_a.
# Create the Database into createdb test_db_2 # Failed Load pg_restore -d test_db_2 test_db_1.pg_dump pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 2770; 0 20015425 TABLE DATA table_a postgres pg_restore: [archiver (db)] COPY failed for table "table_a": ERROR: new row for relation "table_a" violates check constraint "table_a_chk_billing_acct_type" DETAIL: Failing row contains (1, Invoice). CONTEXT: COPY table_a, line 1: "1 Invoice" WARNING: errors ignored on restore: 1
Customized restoration the database - Work's
When we restore this database we need to restore items in a specific order. First we need to load the pre-data section of the backup, aka DDL for the tables, views, etc. Before we load the data section of the backup, we need to load the table_b lookup table. Now we can load the data section for the backup, but this will also double fill the table_b lookup table. The simple solution is to truncate table_b and load it's data again. We are now ready to load the post-data section of the backup and the database is ready for use.
# Re-Create Database dropdb test_db_2 createdb test_db_2 # Load pre-data (DDL) pg_restore -d test_db_2 --section=pre-data test_db_1.pg_dump # Load tatype.lu_tblTelco so Data load won't fail pg_restore -d test_db_2 -a -t table_b test_db_1.pg_dump # Load Data pg_restore -d test_db_2 --section=data test_db_1.pg_dump # Truncate duplicated data psql -d test_db_2 -c 'TRUNCATE schema_b.table_b;' # Reload single set of data pg_restore -d test_db_2 -a -t table_b test_db_1.pg_dump # Load Post Data (Indexes, Foreign Keys, etc.) pg_restore -d test_db_2 --section=post-data test_db_1.pg_dump
You can also create a several pg_restore list files, but I believe that the above solution is simpler to implement.
Even simpler bug demo
In this simpler version we have both tables inside the same schema. The lookup table needs to be alphabetically after the table that uses the data. This simplified test returns a different error, using Postgres 10.3
-- Create the lookup table CREATE TABLE public.table_b ( type VARCHAR NOT NULL, "desc" VARCHAR NOT NULL, CONSTRAINT table_b_pk PRIMARY KEY(type, "desc") ); -- Create the check constraint function CREATE OR REPLACE FUNCTION public.check_lu_table_b (varchar, varchar) RETURNS boolean AS' SELECT CASE WHEN b."desc" IS NULL THEN FALSE ELSE TRUE END FROM (SELECT $2 AS "desc", $1 AS type) a LEFT JOIN public.table_b b USING (type, "desc") 'LANGUAGE 'sql' VOLATILE RETURNS NULL ON NULL INPUT; -- Create the main table, using the check constraint function CREATE TABLE public.table_a ( tp_id SERIAL, billing_acct_type VARCHAR(20), CONSTRAINT "table_a_pk" PRIMARY KEY(tp_id), CONSTRAINT "table_a_chk_billing_acct_type" CHECK (public.check_lu_table_b('BillAcctType', billing_acct_type)) ); -- Insert the test data INSERT INTO public.table_b VALUES ('BillAcctType', 'Invoice'); INSERT INTO public.table_a (billing_acct_type) VALUES ('Invoice');
Backup and Restoration of simpler bug demo
Even though this is the same problem we get a different error message. Instead of the "violates check constraint" we now see a "relation "table_b" does not exist" error message.
#Backup the test_db_1 database to test the restoration pg_dump -Fc test_db_1 -f test_db_1.pg_dump # Create the Database into createdb test_db_2 # Failed Load pg_restore -d test_db_2 test_db_1.pg_dump pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 2774; 0 1322703910 TABLE DATA table_a postgres pg_restore: [archiver (db)] COPY failed for table "table_a": ERROR: relation "table_b" does not exist LINE 4: table_b b USING (type, "desc") ^ QUERY: SELECT CASE WHEN b."desc" IS NULL THEN FALSE ELSE TRUE END FROM (SELECT $2 AS "desc", $1 AS type) a LEFT JOIN table_b b USING (type, "desc") CONTEXT: SQL function "check_lu_table_b" during inlining COPY table_a, line 1: "1 Invoice" WARNING: errors ignored on restore: 1
The cause of the Problem
The problem is not truly caused by the order the tables are restored in. It is cause by where the check constraints are being restored. When we look at the sql being generated by pg_restore, we can see that when table_a is created it includes the check constraint. Foreign key constraints are restored in the post-data section, but not shown in this example. This is why they don't have this same problem.
-- -- PostgreSQL database dump -- -- Dumped from database version 10.3 -- Dumped by pg_dump version 10.3 SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET client_min_messages = warning; SET row_security = off; -- -- Name: schema_a; Type: SCHEMA; Schema: -; Owner: postgres -- CREATE SCHEMA schema_a; ALTER SCHEMA schema_a OWNER TO postgres; -- -- Name: schema_b; Type: SCHEMA; Schema: -; Owner: postgres -- CREATE SCHEMA schema_b; ALTER SCHEMA schema_b OWNER TO postgres; -- -- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog; -- -- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language'; -- -- Name: check_lu_table_b(character varying, character varying); Type: FUNCTION; Schema: schema_b; Owner: postgres -- CREATE FUNCTION schema_b.check_lu_table_b(character varying, character varying) RETURNS boolean LANGUAGE sql STRICT AS $_$ SELECT CASE WHEN b."desc" IS NULL THEN FALSE ELSE TRUE END FROM (SELECT $2 AS "desc", $1 AS type) a LEFT JOIN schema_b.table_b b USING (type, "desc") $_$; ALTER FUNCTION schema_b.check_lu_table_b(character varying, character varying) OWNER TO postgres; SET default_tablespace = ''; SET default_with_oids = false; -- -- Name: table_a; Type: TABLE; Schema: schema_a; Owner: postgres -- CREATE TABLE schema_a.table_a ( tp_id integer NOT NULL, billing_acct_type character varying(20), CONSTRAINT table_a_chk_billing_acct_type CHECK (schema_b.check_lu_table_b('BillAcctType'::character varying, billing_acct_type)) ); ALTER TABLE schema_a.table_a OWNER TO postgres; -- -- Name: table_a_tp_id_seq; Type: SEQUENCE; Schema: schema_a; Owner: postgres -- CREATE SEQUENCE schema_a.table_a_tp_id_seq AS integer START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; ALTER TABLE schema_a.table_a_tp_id_seq OWNER TO postgres; -- -- Name: table_a_tp_id_seq; Type: SEQUENCE OWNED BY; Schema: schema_a; Owner: postgres -- ALTER SEQUENCE schema_a.table_a_tp_id_seq OWNED BY schema_a.table_a.tp_id; -- -- Name: table_b; Type: TABLE; Schema: schema_b; Owner: postgres -- CREATE TABLE schema_b.table_b ( type character varying NOT NULL, "desc" character varying NOT NULL ); ALTER TABLE schema_b.table_b OWNER TO postgres; -- -- Name: table_a tp_id; Type: DEFAULT; Schema: schema_a; Owner: postgres -- ALTER TABLE ONLY schema_a.table_a ALTER COLUMN tp_id SET DEFAULT nextval('schema_a.table_a_tp_id_seq'::regclass); -- -- Data for Name: table_a; Type: TABLE DATA; Schema: schema_a; Owner: postgres -- COPY schema_a.table_a (tp_id, billing_acct_type) FROM stdin; 1 Invoice \. -- -- Data for Name: table_b; Type: TABLE DATA; Schema: schema_b; Owner: postgres -- COPY schema_b.table_b (type, "desc") FROM stdin; BillAcctType Invoice \. -- -- Name: table_a_tp_id_seq; Type: SEQUENCE SET; Schema: schema_a; Owner: postgres -- SELECT pg_catalog.setval('schema_a.table_a_tp_id_seq', 1, true); -- -- Name: table_a table_a_pk; Type: CONSTRAINT; Schema: schema_a; Owner: postgres -- ALTER TABLE ONLY schema_a.table_a ADD CONSTRAINT table_a_pk PRIMARY KEY (tp_id); -- -- Name: table_b table_b_pk; Type: CONSTRAINT; Schema: schema_b; Owner: postgres -- ALTER TABLE ONLY schema_b.table_b ADD CONSTRAINT table_b_pk PRIMARY KEY (type, "desc"); -- -- PostgreSQL database dump complete --
-- -- PostgreSQL database dump -- -- Dumped from database version 10.3 -- Dumped by pg_dump version 10.3 SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET client_min_messages = warning; SET row_security = off; -- -- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog; -- -- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language'; -- -- Name: check_lu_table_b(character varying, character varying); Type: FUNCTION; Schema: public; Owner: postgres -- CREATE FUNCTION public.check_lu_table_b(character varying, character varying) RETURNS boolean LANGUAGE sql STRICT AS $_$ SELECT CASE WHEN b."desc" IS NULL THEN FALSE ELSE TRUE END FROM (SELECT $2 AS "desc", $1 AS type) a LEFT JOIN table_b b USING (type, "desc") $_$; ALTER FUNCTION public.check_lu_table_b(character varying, character varying) OWNER TO postgres; SET default_tablespace = ''; SET default_with_oids = false; -- -- Name: table_a; Type: TABLE; Schema: public; Owner: postgres -- CREATE TABLE public.table_a ( tp_id integer NOT NULL, billing_acct_type character varying(20), CONSTRAINT table_a_chk_billing_acct_type CHECK (public.check_lu_table_b('BillAcctType'::character varying, billing_acct_type)) ); ALTER TABLE public.table_a OWNER TO postgres; -- -- Name: table_a_tp_id_seq; Type: SEQUENCE; Schema: public; Owner: postgres -- CREATE SEQUENCE public.table_a_tp_id_seq AS integer START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; ALTER TABLE public.table_a_tp_id_seq OWNER TO postgres; -- -- Name: table_a_tp_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: postgres -- ALTER SEQUENCE public.table_a_tp_id_seq OWNED BY public.table_a.tp_id; -- -- Name: table_b; Type: TABLE; Schema: public; Owner: postgres -- CREATE TABLE public.table_b ( type character varying NOT NULL, "desc" character varying NOT NULL ); ALTER TABLE public.table_b OWNER TO postgres; -- -- Name: table_a tp_id; Type: DEFAULT; Schema: public; Owner: postgres -- ALTER TABLE ONLY public.table_a ALTER COLUMN tp_id SET DEFAULT nextval('public.table_a_tp_id_seq'::regclass); -- -- Data for Name: table_a; Type: TABLE DATA; Schema: public; Owner: postgres -- COPY public.table_a (tp_id, billing_acct_type) FROM stdin; 1 Invoice \. -- -- Data for Name: table_b; Type: TABLE DATA; Schema: public; Owner: postgres -- COPY public.table_b (type, "desc") FROM stdin; BillAcctType Invoice \. -- -- Name: table_a_tp_id_seq; Type: SEQUENCE SET; Schema: public; Owner: postgres -- SELECT pg_catalog.setval('public.table_a_tp_id_seq', 1, true); -- -- Name: table_a table_a_pk; Type: CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY public.table_a ADD CONSTRAINT table_a_pk PRIMARY KEY (tp_id); -- -- Name: table_b table_b_pk; Type: CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY public.table_b ADD CONSTRAINT table_b_pk PRIMARY KEY (type, "desc"); -- -- PostgreSQL database dump complete --
Patching PostgreSQL Ideas
If the creation of the check constraint is moved from the pre-data section to the post-data section of the backup, this would fix the restoration problem.
Conclusion
While this is an edge case, I think this should be a simple fix by the Postgresql hackers. This bug has been submitted as PostgreSQL Bug #15316 by Lloyd Albin
Thank You's
Julie Bernhardt - Corilate, Inc. - Managing Director
Lloyd Albin
Database Administrator
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Fred Hutchinson Cancer Research Center (FHCRC / Fred Hutch)