deepseas deepseas - 3 months ago 8
Scala Question

Convert a column from textfile into set using Scala

I am trying to convert one set column from multiple files into Set. I am getting the result but it seems that this code considers the whole column from each file as one set element and hence does not remove any duplicates.

I think I am not able to convert or vectorize individual elements from one file so it is grouping the column as single element.

import scala.io.Source


if (args.length > 0){

var ids : Set[String] = collection.immutable.HashSet()
for (arg<-args){
ids += Source.fromFile(arg).getLines().filterNot(_.trim.startsWith("#")).map(_.split("\t")(0)).mkString("\n")
}
println(ids)
}

else
Console.err.println("Please enter filename")


Input files

**File a:

#df
ABC 2
ABC 7
CVF 9


**File b:

#dsdff
#
#
ABC 1
DFG 2
CVF 3


What I get is this output

Set(ABC
DFG
CVF, ABC
ABC
CVF)


Desired output:

Set(ABC,DFG,CVF)

dhg dhg
Answer

Remove the mkstring operation being called on each file's contents and change ids += to ids ++= (since you're adding a collection).

Or you can clean it up a bit in this way:

val ids: Set[String] = 
  args.flatMap { arg =>
    Source.fromFile(arg).getLines()
      .filterNot(_.trim.startsWith("#"))
      .map(_.split("\t").head)
  }.toSet

or like this:

val ids: Set[String] = 
  (for {
    arg <- args
    line <- Source.fromFile(arg).getLines()
    if !line.trim.startsWith("#")
  } yield line.split("\t").head).toSet
Comments