PHP datastax driver - import large dataset

228 views Asked by At

I need to insert a large xlsx (near 1 million rows) file in the Cassandra database and i'm having doubts about how to do it because of memory limitations.

I'm working with batch inserts but that is proving to be near impossible due to huge memory impact.

$batch = new Cassandra\BatchStatement(Cassandra::BATCH_UNLOGGED);

    foreach ($workbook->createRowIterator($myWorksheetIndex) as $rowIndex => $values) {

        if ($count > 0) {

            $time = is_int($values[3]) ? $values[2]->format('d-m-Y') . ' ' . date('00:00:00') : $values[2]->format('d-m-Y') . ' ' . $values[3]->format('H:i:s');
            $date = date_convert(DateTime::createFromFormat('d-m-Y H:i:s', $time));


            $prepared = $session->prepare(
                "INSERT INTO teste (ptd_assoc,ref_equip,dates) " .
                "VALUES (?, ?, ?)"
            );

            $batch->add($prepared, array(
                'ptd_assoc' => $values[0],
                'ref_equip' => $values[1],
                'dates' => new Cassandra\Timestamp(strtotime($date)),
                //  'load' => 3.4454
            ));
        }

        $count++;
    }

    $session->execute($batch);

I have successfully transform the xlsx into a more readable csv file. It's possible to Copy it to the database using the Cassandra\SimpleStatement method?

1

There are 1 answers

3
Adam Holmberg On

If the data is in well-formatted CSV, you may not have to write a custom importer. Have a look at the cqlsh COPY FROM command (help copy;).