I am extracting data from a business system supplied by a third party to use in reporting. I am using a single SELECT statement issued from an SSIS data flow task source component that joins across multiple tables in the source system to create the dataset I want. We are using the default read-committed isolation level.
To my surprise I regularly find this extraction query is deadlocking and being selected as the victim. I didn't think a SELECT in a read-committed transaction could do this, but according to this SO answer it is possible: Can a readcommitted isolation level ever result in a deadlock (Sql Server)?
Through the use of the trace flags 1204 and 12222 I've identified the conflicting statement, and the object and index in question. Essentially, the contention is over a data page in the primary key of one of the tables. I need to extract from this table using a join on its key (so I'm taking out an S lock), the conflicting statement is performing an INSERT and is requesting an IX lock on the index data page.
(Side note: the above SO talks about this issue occurring with non-clustered indexes, but this appears to be occurring in the clustered PK. At least, that is what I believe based on my interpretation of the deadlock information in the event log and the "associatedObjectId" property.)
Here are my constraints:
- The conflicting statement is in an encrypted stored procedure supplied by a third party as part of off-the-shelf software. There is no possibility of getting the plaintext code or having it changed.
- I don't want to use dirty-reads as I need my extracted data to maintain its integrity.
- It's not clear to me how or if restructuring my extract query could prevent this. The lock is on the PK of the table I'm most interested in, and I can't see any alternatives to using the PK.
- I don't mind my extract query being the victim as I prefer this over interrupting the operational use of the source system. However, this does cause the SSIS execution to fail, so if it must be this way I'd like a cleaner, more graceful way to handle this situation.
Can anyone suggestion ways to, preferably, prevent the deadlock, or if not, then handle the error better?
My assumption here is that you are attempting to INSERT into the same table that you are SELECTing from. If no, then a screenshot of the data flow tab would be helpful in determining the problem. If yes, then you're in luck - I have had this problem before.
Add a sort to the data flow as this is a fully blocking transformation (see below regarding blocking transformations). What this means is that the SELECT will be required to complete loading all data into the pipeline buffer before any data is allowed to pass down to the destination. Otherwise, SSIS is attempting to INSERT data while there is a lock on the table/index. You might be able to get creative with your indexing strategies here (I have not tried this). But, a fully blocking transformation will do the trick and eliminates the need for any additional indexes to the table (and the overhead that entails).
Note: never use NOLOCK query hints when selecting data from a table as an attempt to get around this. I have never tried this nor do I intend to. You (the royal you) run the risk of ingesting uncommitted data into your ETL.
Reference:
https://jorgklein.com/2008/02/28/ssis-non-blocking-semi-blocking-and-fully-blocking-components/