Rana Rana - 1 year ago 90
Scala Question

Apache Spark RDD Split "|"

I am trying to produce a formatted CSV file from pipe("|") delimited file using Apache Spark . input file contains:


Blacktown| Bela vista| Greenacre


I am trying with:

val name= sc.textFile(input.txt")
val split=name.map(line=>line.split("|")).map( x => (x(0),x(2)) )





My required output is:


(Blacktown, Greenacre)


Answer Source

An argument for split function is a regular expression so if you want to use pipe it has to be escaped:


otherwise it interpreted as an alternation. It is also better to validate the input:

names.map(_.split("\\|")).collect {
  case Array(x, _, y) => (x, y)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download