SparkDataFrame.dtypes fails if a column has special chars..how to bypass and read the csv and inferschema

Question

SparkDataFrame.dtypes fails if a column has special chars..how to bypass and read the csv and inferschema

146 views Asked by Binu At 28 January 2020 at 11:40

Inferring Schema of a Spark Dataframe throws error if the csv file has column with special chars..

Test sample foo.csv

id,comment 1, #Hi 2, Hello

spark = SparkSession.builder.appName("footest").getOrCreate()
df= spark.read.load("foo.csv", format="csv", inferSchema="true", header="true")
print(df.dtypes)

raise ValueError("Could not parse datatype: %s" % json_value)

I found comment from Dat Tran on inferSchema in spark csv package how to resolve this...cann't we still inferschema before dataclean?

Original Q&A

There are 1 answers

**Ghost** · Answer 1 · 2020-01-28T12:12:45+00:00

Use it like this:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('Test').enableHiveSupport().getOrCreate()

df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").load("test19.csv")
print(df.dtypes)

Output:

[('id', 'int'), ('comment', 'string')]

TechQA.

SparkDataFrame.dtypes fails if a column has special chars..how to bypass and read the csv and inferschema

There are 1 answers

Related Questions in PYSPARK

Related Questions in SPARK-CSV

Popular Questions

Trending Questions