Reading excel using Tablesaw is causing OutOfMemoryError

597 views Asked by At

I am using following api to read data from excel as a Table: https://jtablesaw.github.io/tablesaw/gettingstarted

The code is as follows:

XlsxReader reader = new XlsxReader();
        XlsxReadOptions options = XlsxReadOptions.builder("excel/file_example_XLSX_10.xlsx").build();
        try {
            tab = reader.read(options);
            // System.out.println(tab.print());
        } catch (Exception e) {
            e.printStackTrace();
}

The file file_example_XLSX_10.xlsx is around 120 mb in size and I am getting OutOfMemoryError.

Is there a way for me to read only specific columns from the file.

2

There are 2 answers

1
Conor Creagh On

I don't think there's a way to only read certain columns, Have you tried using Apache POI to read the excel instead? or increase the memory when running?

0
larry On

I'm not familiar with reading Excel files, but if you can export it as one or more CSVs, here's a couple things to look at:

1) You can read files in a way that minimizes memory use. For convenience, tablesaw does not use the smallest possible numeric types. It defaults to int and double. You can specify that it try using less memory so that it will use a short or float if the given data will fit.

    Table t = Table.read()
       .csv(CsvReadOptions.builder("../myfile.csv")
          .minimizeColumnSizes()
    );

This might work for Excel also as it's defined in ReadOptions, rather than the more specific CsvReadOptions.

2) Alternately, for CSV you can specify an array of ColumnTypes, one of which can be ColumnType.SKIP. Again this can be done using CsvReadOptions.

With CSV at least, 150MB isn't too big for a typical desktop app. I read an 800MB, file yesterday without a problem and without touching the JVM memory settings in IDEA. OTOH, I'm not on the latest version so YMMV.