User1232187 User1232187 - 3 months ago 14
Scala Question

How to filter a List with another List based on some conditions?

Let's say I have this code to figure out dups in a List based on a constructor parameter: (I ended up with this after parsing some text files which have duplicates.)

case class Line(ini: String, name:String, com:String)

val l0 = Line("X", "hello", "some text")
val l1 = Line("", "world", "some text")
val l2 = Line("X", "computer", "")
val l3 = Line("", "hello", "")
val l4 = Line("X", "world", "")
val l5 = Line("", "hello", "some stuff")

val lineList = List(l0,l1,l2,l3, l4, l5)

val dup = lineList.groupBy(_.name).collect { case (x, List(_,_,_*)) => x } // should yield List("hello", "world")


Now I know which one is a duplicate. But how can I filter the lineList again to filter out the dups based on some other rules?

In the end I want to have a List with no duplicates anymore but I also want to retain as much information from the properties
ini
and
com
as possible. That means I want to keep the duplicate that follows one of the following rules:


  • Lines with content in property
    ini
    and
    com
    have precedence over all others, meaning:
    Line("X", "hello", "some text")
    vs
    Line("", "hello", "some text")
    vs
    Line("", "hello", "")
    should give back the first

  • Lines with content in property
    com
    have precedence over
    ini
    , meaning:
    Line("", "hello", "")
    vs
    Line("", "hello", "some text")
    should give back the last one

  • Lines with content in property
    ini
    have precedence over lines with nothing in ini or com, meaning:
    Line("X", "hello", "")
    vs
    Line("", "hello", "")
    should give back the first

  • in case both duplicates have information in
    ini
    and
    com
    , I don't care which one is selected.



I wonder if that's not overly complicated and there might be another way to solve this. All I want to accomplish is a List that has no more dups while keeping that dup that had the most information on it. How would one solve this?

Answer

You can define a chooseBetterLine function that does the logic you need for any two lines with the same name (I hope I followed it correctly) - and then use reduce on the values:

def chooseBetterLine(l1: Line, l2: Line): Line = {
  if (l1.ini.nonEmpty && l2.ini.isEmpty) l1
  else if (l1.com.nonEmpty && l2.com.isEmpty) l1
  else l2
}

val result: Iterable[Line] = lineList.groupBy(_.name).values.map(_.reduce(chooseBetterLine))