I'm doing batch insert to db, now I'm facing with data types issue when inserting it as a batch into cassandra.
Here I created a table named batch_rows , you can see type of node_id is text :
("""
CREATE TABLE IF NOT EXISTS batch_rows (
local_pid int,
node_id text,
camera_id int,
geolocation int,
filepath int,
PRIMARY KEY (local_pid, node_id)
)
""")
After creating table then I tried to grab 500 of data and do the batch insert to db :
statement = 'BEGIN BATCH '
to_insert = 500
for i in range(to_insert):
location_generator = self.generate_random_coordinate()
local_pid = i
node_id= 'P2'
camera_id= 1
geolocation= 4
filepath = 3
statement += 'INSERT INTO batch_rows (local_pid, node_id, camera_id, geolocation, filepath) VALUES (%s, %s, %s, %s, %s) ' %
(local_pid, node_id, camera_id, geolocation, filepath)
statement += 'APPLY BATCH'
statement = SimpleStatement(statement, consistency_level=ConsistencyLevel.QUORUM)
try:
self.session.execute(statement,timeout=30.0)
except OperationTimedOut:
ex_type, ex, tb = sys.exc_info()
print("Batch wide row insertion timed out, this may require additional investigation")
del tb
And finally it gave me the error :
[Syntax error in CQL query] message="line 1:103 no viable alternative at input ',' (... filepath) VALUES (0, [P2],...)">
so yeah I am kinda stuck atm, all the helps would be appreciated guy.!
Your code is invalid and has multiple CQL syntax errors which is why it is not working.
The
BEGIN BATCH ... APPLY BATCH;commands are only valid when executing CQL queries in cqlsh -- they are not designed to be used with the native CQL drivers.Here is an example code showing how to execute a batch operation with the Python driver APIs using a
SimpleStatementwithin aBatchStatement:For details and more examples, see the Python driver API document on the
BatchStatementclass.Warning
However, it important to note that what you are trying to implement is a bad idea and is NOT recommended.
It is important to note that batches are not an optimisation in Cassandra in the same way that you use batches in relational databases. Only use batches when necessary or you risk overloading the coordinator nodes which can affect the overall performance of your cluster.
When bulk-loading data, the recommended approach is to issue multiple asynchronous inserts to maximise the throughput of the cluster since the requests will be distributed to all nodes in the cluster instead of just one coordinator node.
I strongly recommend having a look at Batching inserts, updates and deletes in Cassandra, paying particular attention to the sections on when it is good to batch operations and when it is not.
Additionally, I'll reiterate what I said in your other question -- avoid making schema changes programatically to avoid schema disagreements. Cheers!