Knows Not Much Knows Not Much - 11 months ago 75
Scala Question

Scala Processing a file in batches

I have a flat file which contains several million lines like one below

59, 254, 2016-09-09T00:00, 1, 6, 3, 40, 18, 0

I want to process this file in batches of X rows at a time. So I wrote this code

def func(x: Int) = {
for {
batches <- Source.fromFile("./foo.txt").getLines().sliding(x, x)
} yield"(" + _ + ")").mkString(",")

This code produces exactly the output I want. the function walks through entire file taking 2 rows at a time batch them into 1 string.

(59, 828, 2016-09-09T00:00, 0, 8, 2, 52, 0, 0),(59, 774, 2016-09-09T00:00, 0, 10, 2, 51, 0, 0)

But when I see scala pros write code everything happens inside the for comprehension and you just return the last thing from your comprehension.

So in order to be a scala pro I change my code

for {
batches <- Source.fromFile("./foo.txt").getLines().sliding(2, 2)
line <-"(" + _ + ")").mkString(",")
} yield line

This produces 1 character per line and not the output I expected. Why did the code behavior totally change? At least on reading they look the same to me.

dhg dhg
Answer Source

In the line line <-"(" + _ + ")").mkString(","), the right-hand side produces a String (the result of mkString), and the loop iterates over this string. When you iterate over a string, the individual items are characters, so in your case line is going to be a character. What you actually want is not to iterate over that string, but to assign it to the variable name line, which you can do by replacing the <- with =: line ="(" + _ + ")").mkString(",").

By the way, sliding(2,2) can be more clearly written as grouped(2).