Read too large CSV with with c#

309 views Asked by At

I have to extract specifict information from a too big CSV file with C#, the file is 233 MB in size, it has 1000008 rows and 28 columns and insert tha information in a datatable. actually i tried with OleDB and Aspose.cells, but both ways are too slow

ASPOSE CODE:

public static DataTable CsvReader(string CsvFile)
{
    DataTable DtTemp = new DataTable();
    try
    {
        System.Windows.Forms.Application.DoEvents();
        Aspose.Cells.LoadOptions loadOptions = new LoadOptions(LoadFormat.CSV);
        Aspose.Cells.Workbook workbook = new Aspose.Cells.Workbook(CsvFile, loadOptions);
        string SheetName = workbook.Worksheets[0].Name;
        DtTemp = workbook.Worksheets[SheetName].Cells.ExportDataTable(0, 0, workbook.Worksheets[SheetName].Cells.MaxDataRow + 1, workbook.Worksheets[SheetName].Cells.MaxDataColumn + 1);
        //filter DtTemp datatable based on 4th column
    }
    catch (Exception) { DtTemp.Reset(); }
    return DtTemp;
}

OleDB

public static DataTable CsvReader(string CsvFile)
{
    DataTable DtTemp = new DataTable();
    try
    {
        System.Windows.Forms.Application.DoEvents();
        OleDbConnection conn = null;
        string strConnString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Path.GetDirectoryName(CsvFile) + "; Extended Properties='text; HDR=NO; IMEX=3; FMT=Delimited(,)';";
        string sql_select;
        conn = new OleDbConnection(strConnString.Trim());
        sql_select = "select * from [" + Path.GetFileName(CsvFile) + "] WHERE F4 Like '%27628%' OR F4 Like '%35627%'";
        conn.Open();
        OleDbCommand cmd = new OleDbCommand(sql_select, conn);
        OleDbDataAdapter obj_oledb_da = new OleDbDataAdapter(cmd);
        DataTable dtSchema = new DataTable();
        obj_oledb_da.FillSchema(dtSchema, SchemaType.Source);
        if (dtSchema != null)
        {
            writeSchema(dtSchema, CsvFile);
            obj_oledb_da.Fill(DtTemp);
        }
    }
    catch (Exception) { DtTemp.Reset(); }
    return DtTemp;
}
1

There are 1 answers

0
Amjad Sahi On

To extract large CSV data (line by line) in an event driven mode efficiently, you may try using LightCells APIs provided by Aspose.Cells for .NET which is a light weight mode to get better performance. Moreover you can fill data into a datatable row by row in the implementation of lightcells APIs.

See the following sample code segment for your complete reference:

e.g.

Sample code:

public void TestExportingDataTable()
{
            //here you may initialize the table according to the data
            //if possible, you can also connect to the database directly and send data to database directly in DataTableExporter
            DataTable dt = new DataTable();
            dt.Columns.Add("Column1", typeof(string));
            dt.Columns.Add("Column2", typeof(DateTime));
            dt.Columns.Add("Column3", typeof(double));
            
            TxtLoadOptions opts = new TxtLoadOptions();
            //if you can optimize the process of parsing data, disbaling the automatic conversion and use your own logic should be better for performance
            opts.ConvertDateTimeData = false;
            opts.ConvertNumericData = false;
            //using lightcells to improve performance for both time cost and memeory cost
            opts.LightCellsDataHandler = new DataTableExporter(dt);
            Workbook wb = new Workbook("largedata.csv", opts);
            //here you may check the generated data table by yourself.
}
private class DataTableExporter : LightCellsDataHandler
{
            private readonly DataTable mTable;
            private DataRow mDataRow;

            public DataTableExporter(DataTable dt)
            {
                mTable = dt;
            }
            public bool StartSheet(Worksheet sheet)
            {
                return true;
            }
            public bool StartRow(int row)
            {
                mDataRow = mTable.NewRow();
                mTable.Rows.Add(mDataRow);
                return true;
            }
            public bool ProcessRow(Row row)
            {
                return true;
            }
            public bool StartCell(int col)
            {
                return true;
            }
            public bool ProcessCell(Cell cell)
            {
                //in this sample, we just put values into the DataTable in memory
                //if you can save those data to database directly, we think it may improve performance significantly too.
                int col = cell.Column;
                if (col == 1)
                {
                    mDataRow[1] = DateTime.Parse(cell.StringValue);
                }
                else if (col == 2)
                {
                    mDataRow[2] = double.Parse(cell.StringValue) + col;
                }
                else
                {
                    mDataRow[col] = cell.StringValue;
                }
                return false;
            }
}

We hope this will help you.

You may also post your queries or discuss further in the dedicated forum.

PS. I am working as Support developer/ Evangelist at Aspose.