BULK INSERT fails - tried many variations on row delimiter

105 views Asked by At

I am struggling to script a BULK INSERT for my .dat files. I am getting many errors depending on what I try, but I'm pretty sure it's related to the row delimiter. I can successfully use the SQL import wizard for my .dat files. Under "Specify the characters that delimit the source file", I see the following as defaults that I accept:

Row delimiter: {CR}{LF}   
Column delimiter: Comma {,}

The 500k rows and 158 columns come in just beautifully thanks to the wizard.

When scripting, I have tried all manner of \r\n variations and 0x0a etc and searched for help and reviewed similar posts. I can successfully script in a single line of real data, so I don’t think the column parsing is an issue. It fails when I try 2 lines in a dummy file. The least amount of errors I generate occurs with the following

bulk insert abc
from 'C:\TestDat\try2.dat'
with (FIRSTROW = 1, fieldterminator = ',', rowterminator = '\r\n')
go

Msg 4863, Level 16, State 1, Line 3
Bulk load data conversion error (truncation) for row 1, column 158 (Column 157).

(using 0x0a will create more errors on a 2 line file).

.dat contents:

xxx,20080501,xxx,xxx:175000,55008654,0178636,202307011001,001859915,OK,W,xx,1.0,00000100,48,202306290100,R0,2,202306290200,R0,0,202306290300,R0,0,202306290400,R0,0,202306290500,R0,1,202306290600,R0,1,202306290700,R0,0,202306290800,R0,0,202306290900,R0,0,202306291000,R0,0,202306291100,R0,0,202306291200,R0,1,202306291300,R0,0,202306291400,R0,0,202306291500,R0,2,202306291600,R0,1,202306291700,R0,0,202306291800,R0,0,202306291900,R0,0,202306292000,R0,0,202306292100,R0,0,202306292200,R0,0,202306292300,R0,2,202306300000,R0,3,202306300100,R0,0,202306300200,R0,0,202306300300,R0,2,202306300400,R0,0,202306300500,R0,1,202306300600,R0,1,202306300700,R0,3,202306300800,R0,0,202306300900,R0,0,202306301000,R0,0,202306301100,R0,0,202306301200,R0,3,202306301300,R0,1,202306301400,R0,0,202306301500,R0,1,202306301600,R0,1,202306301700,R0,0,202306301800,R0,0,202306301900,R0,1,202306302000,R0,0,202306302100,R0,1,202306302200,R0,0,202306302300,R0,1,202307010000,R0,0
xxx,20080501,xxx,xxx:175000,55008654,0178636,202307011001,001859915,OK,W,xx,1.0,00000100,4,202307010100,R0,1,202307010200,R0,0,202307010300,R0,1,202307010400,R0,0
1

There are 1 answers

0
Luuk On BEST ANSWER

Ok, I created the try2.dat file,

and i created a table abc using (I left out column3..column157, but they have same definition....);

CREATE TABLE abc 
(
    Column1 VARCHAR(200),
    Column2 VARCHAR(200),
    ...
    Column158 VARCHAR(200),
    Column159 VARCHAR(200)
)

Then started SQLCMD

D:\TEMP>sqlcmd
1> use test
2> go
Changed database context to 'TEST'.
1> delete from abc;
2>
3> go

(1 rows affected)
1>
2> bulk insert abc from 'd:\temp\try2.dat' with (FIRSTROW = 1, fieldterminator = ',', rowterminator = '\r\n')
3> go

(1 rows affected)

When you are getting an error, I see two possibilities:

When selecting the table in SSMS, I do see:

[![start of the row][1]][1]

[![end of the row][2]][2]

The last column does not seem to be OK...... More investigation needed....

A workaround could be:

Create a table like this:

CREATE TABLE def
(
    Column1 VARCHAR(MAX)
);

and import using (since you do not have # characters in your file):

bulk insert def from 'd:\temp\try2.dat' 
with (FIRSTROW = 1, fieldterminator = '#', rowterminator = '\r\n')

After that, you can do:

SELECT * 
FROM STRING_SPLIT((SELECT Column1 FROM def), ',');

Resulting in:

value
xxx
20080501
xxx
xxx:175000
55008654
0178636
202307011001
001859915
OK
W
xx
1.0
00000100
48
202306290100
R0
........

NOTE: When your version of SQL is new enough, you can add the enable_ordinal parameter to STRING_SPLIT

NOTE2: Why is this enable_ordinal added to STRING_SPLIT?

Because in the old (or current ) way it works, STRING SPLIT does not "care" about ordinal position (https://stackoverflow.com/a/64092709/724039)

I created a DBFIDDLE with your data, this has a line number added to def

Part of the SELECT R, value FROM def d CROSS APPLY STRING_SPLIT((SELECT Column1 ), ','); (near the line change) is:

R value
...
1 1
1 202307010400
1 R0
1 0
2 xxx
2 20080501
2 xxx
2 xxx:175000
...

Unfortunately it is not possible (AFAIK) to add row number per line number in a reliable way...

ANOTHER ATTEMPT:

seeing: "The column count per row will equal 14+n*3 where n = the value in the 14th column."

Define your import table (here named with the clumsy name def2:

CREATE TABLE def2 (
  Column1 VARCHAR(200),
  Column2 VARCHAR(200),
  Column3 VARCHAR(200),
  Column4 VARCHAR(200),
  Column5 VARCHAR(200),
  Column6 VARCHAR(200),
  Column7 VARCHAR(200),
  Column8 VARCHAR(200),
  Column9 VARCHAR(200),
  Column10 VARCHAR(200),
  Column11 VARCHAR(200),
  Column12 VARCHAR(200),
  Column13 VARCHAR(200),
  Column14 VARCHAR(200),
  ColumnRest VARCHAR(MAX)
);

Import your data:

bulk insert def2 from 'D:\Temp\try2.dat' with (fieldterminator = ',');

Finally get the values, which can be used to be inserted in a (new) table:

WITH def2tmp AS (
   SELECT ROW_NUMBER() OVER (ORDER BY ColumnRest) as R, Column14, RTRIM(ColumnRest) + ',' as ColumnRest
   FROM def2
), PosComma AS (
   SELECT 
       R , 
       Column14, 
       CHARINDEX(',',ColumnRest,1) as R2, 
       SUBSTRING(ColumnRest,1,CHARINDEX(',',ColumnRest,1)-1) as value
       ,ColumnRest
   FROM def2tmp
   UNION ALL
   SELECT 
       R, 
       Column14, 
       CHARINDEX(',',ColumnRest,R2+1), 
       SUBSTRING(ColumnRest,R2+1,CASE WHEN (CHARINDEX(',',ColumnRest,R2+1)-R2)-1 < 0 THEN 100 ELSE (CHARINDEX(',',ColumnRest,R2+1)-R2)-1 END)
       ,ColumnRest
   FROM PosComma
   WHERE R2<LEN(ColumnRest) and r<=2
)
SELECt 
   R, 
   R2, 
   ROW_NUMBER() OVER (PARTITION BY R ORDER BY R2) as R3,
   value as value 
FROM PosComma
ORDER BY column14 DESC,R2 
OPTION (MAXRECURSION 20000)
;

output (where R is the row number from your import table, and R3 is the row number of the values (R2 is the position of the , in ColumnRest)):

P.S. SORRY for this messy output....

This solution gets you two tables, the first def2 with the first 14 columns, the second with the values in that row (but you have to define that table yourself!)

R R2 R3 value
1 13 1 202306290100
1 16 2 R0
1 18 3 2
1 31 4 202306290200
1 34 5 R0
1 36 6 0
1 49 7 202306290300
1 52 8 R0
1 54 9 0
1 67 10 202306290400
1 70 11 R0
1 72 12 0
1 85 13 202306290500
1 ... ... ......
1 823 136 202306302200
1 826 137 R0
1 828 138 0
1 841 139 202306302300
1 844 140 R0
1 846 141 1
1 859 142 202307010000
1 862 143 R0
1 864 144 0
2 13 1 202307010100
2 16 2 R0
2 18 3 1
2 31 4 202307010200
2 34 5 R0
2 36 6 0
2 49 7 202307010300
2 52 8 R0
2 54 9 1
2 67 10 202307010400
2 70 11 R0
2 72 12 0