deepseas deepseas - 3 months ago 19
Scala Question

Reading multiple files and extracting 1st column using scala

I am starting to learn Scala and got into this simple problem. I am used to doing this using Unix command line with Bash and Awk but I decided to use Scala for learning.

I want to parse multiple text file which are tab separated and want to extract 1 or any arbitrary column.

I also want to remove lines that start with "#" which I was able to do.

The code below will print first row from a specific column from each file.
How do I get it to print all the rows?

import scala.io.Source

if (args.length > 0){

for (arg<-args){
val file= Source.fromFile(arg).getLines.filter(s => !(s contains "#")).mkString("\n").split("\t")
println(file(2))
}
}

else
Console.err.println("Please enter filename")


Thank you

Answer

Calling mkString("\n") on getLines will result in a single string of the entire file and that is the reason you are seeing output for first row alone.

The following code snippet should be working:

  if (args.length > 0) {
    for (arg <- args) {
      println(Source.fromFile(arg).getLines().filterNot(_.trim.startsWith("#")).map(_.split("\t")(2)).mkString("\n"))
    }
  }
Comments