Receiving error "Hive cannot safely cast" when trying to insert data containing a timestamp using SQLAlchemy and PyHive

152 views Asked by At

I am trying to use PyHive and SQLAlchemy to bulk insert data into a Hive database on a Hadoop cluster.

Here is the relevant part of my code

from sqlalchemy import DateTime, String, Float
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine(...)

meta = MetaData()
con = engine.connect()
dataTable = Table(
    'data', meta, 
    Column("timestamp",DateTime),
    Column("id", String),
    ...
)
dbdata = []
...
for data in some_source:
    dbdata.append({ 
        "timestamp": data.time,  #this is a python DateTime object
        "id": data.id,
        ...
    })
con.execute(dataTable.insert(), dbdata)

I am receiving the following error:

(pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error running query: 
[INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_SAFELY_CAST] org.apache.spark.sql.AnalysisException: 
[INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_SAFELY_CAST] Cannot write incompatible data for the table `spark_catalog`.`default`.`data`: Cannot safely cast `timestamp` "STRING" to "TIMESTAMP".:26:25', 
0

There are 0 answers