sparkr sparkr - 6 months ago 124
Scala Question

Compress a Given Text of String in Scala

I have been trying to compress a String. Given a String like this:

AAABBCAADEEFF, I would need to compress it like 3A2B1C2A1D2E2F

I was able to come up with a tail recursive implementation:

@scala.annotation.tailrec
def compress(str: List[Char], current: Seq[Char], acc: Map[Int, String]): String = str match {
case Nil =>
if (current.nonEmpty)
s"${acc.values.mkString("")}${current.length}${current.head}"
else
s"${acc.values.mkString("")}"
case List(x) if current.contains(x) =>
val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length + 1}${current.head}")
compress(List.empty[Char], Seq.empty[Char], newMap)
case x :: xs if current.isEmpty =>
compress(xs, Seq(x), acc)
case x :: xs if !current.contains(x) =>
if (acc.nonEmpty) {
val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length}${current.head}")
compress(xs, Seq(x), newMap)
} else {
compress(xs, Seq(x), acc ++ Map(1 -> s"${current.length}${current.head}"))
}
case x :: xs =>
compress(xs, current :+ x, acc)
}

// Produces 2F3A2B1C2A instead of 3A2B1C2A1D2E2F
compress("AAABBCAADEEFF".toList, Seq.empty[Char], Map.empty[Int, String])


It fails however for the given case! Not sure what edge scenario I'm missing! Any help?

So what I'm actually doing is, going over the sequence of characters, collecting identical ones into a new Sequence and as long as the new character in the original String input (the first param in the compress method) is found in the current (the second parameter in the compress method), I keep collecting it.

As soon as it is not the case, I empty the current sequence, count and push the collected elements into the Map! It fails for some edge cases that I'm not able to make out!

Answer Source

Took inspiration from the @nicodp code

def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

First our intermediate result will be List[(Char, Int)]. List of tuples of chars each char will be accompanied by its count.

Now lets start going through the list one char at once using the Great! foldLeft

We will accumulate the result in the acc variable and e represents the current element.

acc is of type List[(Char, Int)] and e is of type Char

Now when we start, we are at first char of the list. Right now the acc is empty list. So, we attach first tuple to the front of the list acc with count one.

when acc is Nil do (e, 1) :: Nil or (e, 1) :: acc note: acc is Nil

Now front of the list is the node we are interested in.

Lets go to the second element. Now acc has one element which is the first element with count one.

Now, we compare the current element with the front element of the list if it matches, increment the count and put the (element, incrementedCount) in the front of the list in place of old tuple.

if current element does not match the last element, that means we have new element. So, we attach new element with count 1 to the front of the list and so on.

then to convert the List[(Char, Int)] to required string representation.

Note: We are using front element of the list which is accessible in O(1) (constant time complexity) has buffer and increasing the count in case same element is found.

Scala REPL

scala> :paste
// Entering paste mode (ctrl-D to finish)

def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)

// Exiting paste mode, now interpreting.

encode: (word: String)String

scala> encode("AAABBCAADEEFF")
res0: String = 3A2B1C2A1D2E2F

Bit more concise with back ticks e instead of guard in pattern matching

   def encode(word: String): String =
      word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
        acc match {
          case Nil => (e, 1) :: Nil
          case ((`e`, lastCharCount) :: xs) => (e, lastCharCount + 1) :: xs
          case xs => (e, 1) :: xs
        }
      }.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download