timmyc31 timmyc31 - 2 months ago 7
Scala Question

Parsing matrix data with Scala

I have a log file with the following format:

3
1 2 3
1 2 3
1 2 3
1 2 3
4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4


The single number states the width of the matrix, as they always have the same height. And there can be several matrixes within the same log file. I wan't to parse the matrix data into an array. I read the lines with
scala.io.Source.fromFile(f).getLines.mkString
, but I'm struggling to fill the array.

for(i <- 0 to 3) {
for(j <- 0 to N-1) {
matrix(i)(j) = ...
}
}


If the lines would have been indexed the same way as I want the matrix to be, this wouldn't be so hard. But when the lines(n) contains whitespace, newlines.. What am I doing wrong?

Answer

You can do this quite easily in a few simple steps:

  1. First break the input into a List of lines
  2. Then break each line into a List of Strings
  3. Then convert each String in the list to an Int
  4. And finally summarize this List of Lists of Lists to a List of Arrays (using a simple state machine)

The state machine is quite simple.

  1. It first reads the number of lines in the next matrix and memorizes it
  2. It then reads in that number of lines to the current matrix
  3. After it has read the memorized number of lines it adds the current matrix to the list of read matrixes and goes back to step 1

The code will look something like this:

    import io.Source

    def input = Source.fromString(
       """|3
          |1 2 1
          |1 2 2 
          |1 2 3
          |4
          |1 2 3 1
          |1 2 3 2
          |1 2 3 3
          |1 2 3 4""".stripMargin) // You would probably use Source.fromFile(...)

    type Matrix = List[Array[Int]]

    sealed trait Command
    case object ReadLength extends Command
    case class ReadLines(i: Int, matrix: Matrix) extends Command

    case class State(c: Command, l: List[Matrix])

    val parsedMatrixes = input.getLines().map(_.split(" ")).map(_.map(_.toInt)).foldLeft(State(ReadLength, List())) {
       case (State(ReadLength, matrixes), line) => State(ReadLines(line(0), List()), matrixes)
       case (State(ReadLines(1, currentMatrix), matrixes), line) => State(ReadLength,((line::currentMatrix).reverse)::matrixes)
       case (State(ReadLines(i, currentMatrix), matrixes), line) => State(ReadLines(i - 1, line::currentMatrix), matrixes)
    }.l.reverse

And gives you the following result:

parsedMatrixes: List[Matrix] = 
List(
  List(Array(1, 2, 1), 
       Array(1, 2, 2), 
       Array(1, 2, 3)), 
  List(Array(1, 2, 3, 1), 
       Array(1, 2, 3, 2), 
       Array(1, 2, 3, 3), 
       Array(1, 2, 3, 4)))

Please be aware that this cannot be the final solution because it does not have any error handling. And it does not free up its resources (closing the source).

Comments