Jack Doce Jack Doce - 1 year ago 141
Scala Question

Read .csv data in european format with Spark

I am currently doing my first attempts with Apache Spark.
I would like to read a .csv File with an SQLContext object, but Spark won't provide the correct results as the File is a european one (comma as decimal separator and semicolon used as value separator).
Is there a way to tell Spark to follow a different .csv syntax?

val conf = new SparkConf()

val sc = new SparkContext(conf)

val sqlContext = new SQLContext(sc)

val df = sqlContext.read


A row in the relating .csv looks like this:


Spark interprets the entire row as a column.

|Col1;Col2,Col3;Col4;Col5 |
| 04.10.2016;12:51:...|

In previous attempts to get it working
was even printing more row-content where it now says '...' but eventually cutting the row at the comma in the third col.

Answer Source

You can just read as Test and split by ; or set a custom delimiter to the CSV format as in .option("delimiter",";")

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download