zelenov aleksey zelenov aleksey - 1 month ago 24
Scala Question

Process large text file using Zeppelin and Spark

I'm trying to analyze(visualize actually) some data from large text file(over 50 GB) using Zeppelin (scala). Examples from the web use csv files with known header and datatypes of each column. In my case, I have lines of a pure data with " " delimiter. How do I achive putting my data into DataFrame like in the code below?:

case class Record()

val myFile1 = myFile.map(x=>x.split(";")).map {
case Array(id, name) => Record(id.toInt, name)
}

myFile1.toDF() // DataFrame will have columns "id" and "name"


P.S. I want dataframe with columns "1","2"...
thx

Answer

You can use csv:

spark.read.option("delimiter", ";").csv(inputPath)
Comments