I have a timestamp column
data = [(1,'2023-01-22 09:00'),(2,'2023-09-11 00:09')]
schema = StructType([StructField("id",IntegerType(),False),StructField("ts",StringType(),True)])
main_df = spark.createDataFrame(data,schema)
main_df.printSchema()
root
|-- id: integer (nullable = false)
|-- ts: string (nullable = true)
main_df2 = main_df.withColumn('ts', date_format(to_timestamp(col('ts'),("yyyy-MM-dd HH:mm")),"yyyy-MM-dd HH:mm").cast("timestamp")).show()
main_df2.printSchema()
root
|-- id: integer (nullable = false)
|-- ts: timestamp (nullable = true)
main_df2.show()
+---+-------------------+
| id| ts|
+---+-------------------+
| 1|2023-01-22 09:00:00|
| 2|2023-09-11 00:09:00|
+---+-------------------+
Is it possible to have a timestamp datatype column, in Pyspark, without the seconds, like yyyy-MM-dd HH:mm?
Desired Output
+---+----------------+
| id| ts|
+---+----------------+
| 1|2023-01-22 09:00|
| 2|2023-09-11 00:09|
+---+----------------+~
root
|-- id: integer (nullable = false)
|-- ts: timestamp (nullable = true
Thanks in advande
You don't need
.cast("timestamp")after you did adate_format- just remove it and you'll get what you need: