I need to insert a large xlsx (near 1 million rows) file in the Cassandra database and i'm having doubts about how to do it because of memory limitations.
I'm working with batch inserts but that is proving to be near impossible due to huge memory impact.
$batch = new Cassandra\BatchStatement(Cassandra::BATCH_UNLOGGED);
foreach ($workbook->createRowIterator($myWorksheetIndex) as $rowIndex => $values) {
if ($count > 0) {
$time = is_int($values[3]) ? $values[2]->format('d-m-Y') . ' ' . date('00:00:00') : $values[2]->format('d-m-Y') . ' ' . $values[3]->format('H:i:s');
$date = date_convert(DateTime::createFromFormat('d-m-Y H:i:s', $time));
$prepared = $session->prepare(
"INSERT INTO teste (ptd_assoc,ref_equip,dates) " .
"VALUES (?, ?, ?)"
);
$batch->add($prepared, array(
'ptd_assoc' => $values[0],
'ref_equip' => $values[1],
'dates' => new Cassandra\Timestamp(strtotime($date)),
// 'load' => 3.4454
));
}
$count++;
}
$session->execute($batch);
I have successfully transform the xlsx into a more readable csv file. It's possible to Copy it to the database using the Cassandra\SimpleStatement method?
If the data is in well-formatted CSV, you may not have to write a custom importer. Have a look at the
cqlshCOPY FROMcommand (help copy;).