zelenov aleksey zelenov aleksey - 3 years ago 211
Scala Question

Process large text file using Zeppelin and Spark

I'm trying to analyze(visualize actually) some data from large text file(over 50 GB) using Zeppelin (scala). Examples from the web use csv files with known header and datatypes of each column. In my case, I have lines of a pure data with " " delimiter. How do I achive putting my data into DataFrame like in the code below?:

case class Record()

val myFile1 = myFile.map(x=>x.split(";")).map {
case Array(id, name) => Record(id.toInt, name)

myFile1.toDF() // DataFrame will have columns "id" and "name"

P.S. I want dataframe with columns "1","2"...

Answer Source

You can use csv:

spark.read.option("delimiter", ";").csv(inputPath)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download