I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts:
INSERT INTO table (col1,col2) VALUES ('v1','v2'),('v3','v4') ... etc
I was wondering what would happen if it hits a duplicate key for the insert. Will it stop the entire query and throw an exception? Or will it merely ignore the insert of that specific row and move on?
                        
The
INSERTwill just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY,UNIQUE,CHECKorEXCLUDEconstraint) - which you did not mention in your question. But that's what you are probably worried about.Assuming a
UNIQUEor PK constraint on(col1,col2), you are dealing with a textbookUPSERTsituation. Many related questions and answers to find here.Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.
Without concurrent writes
I.e.: No other transactions will try to write to the same table at the same time.
Exclude rows that are already in the table with
WHERE NOT EXISTS ...or any other applicable technique:Select rows which are not present in other table
And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join
WHERE NOT EXISTS ...One technique to deal with both at once would be
EXCEPT:EXCEPTwithout the key wordALLfolds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, useEXCEPT ALL(or one of the other techniques). See:Generally, if the target table is big,
WHERE NOT EXISTSin combination withDISTINCTon the source will probably be faster:If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.
Related:
With concurrent writes
Use the Postgres
UPSERTimplementationINSERT ... ON CONFLICT ...in Postgres 9.5 or later:Further reading:
Documentation:
Craig's reference answer for
UPSERTproblems: