Captcha Captcha - 1 year ago 186
Scala Question

remove pipe delimiter from data using spark

i am new to spark, i am using scala to separate pipe delimited file and save in hdfs without pipe delimited, for that i have written this code.

object WordCount {
def main(args: Array[String])
val textfile = sc.textFile("/user/cloudera/xxxx/xxxx")
val word = l => l.split("|"))

but when i am executing it i am not getting any error's but in my hdfs i am getting below data.


i don't know what i am doing wrong.
Please help

Answer Source

That's because you are splitting each string into a Array of Strings. To save as text file, you'll need to use mkString(",") if you wish to concatenate with a comma. But I don't see any purpose in that.

If you want to replace pipe separator by a comma, you can use _.replaceAll("|",",") instead and save it :

val word ="\\|",",").replaceFirst(",","").trim)

PS : You can replace the comma with anything you want e.g a whitespace, a word, etc.

So Why does the pipe need to be escaped ?

A string split expects a regular expression argument. An unescaped | is parsed as a regex meaning "empty string or empty string," which isn't what you mean.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download