stack0114106 stack0114106 - 2 months ago 11
Scala Question

scala regex.. single pattern to match when one or multiple records are present

I need a scala solution, that is a single pattern that will match my first record entry that spans multiple lines among multiple records. The record will always start with the word RECORD.

Scenario1


==================================================
RECORD-1

    "FOO BAR"

    "ID-100"

    "TOY"

==================================================


Scenario2


==================================================
RECORD-1

    "FOO BAR"

    "ID-100"

    "TOY"

RECORD-2

    "X BAR"

    "ID-200"

    "DOLL"

RECORD-3

    "Y BAR"

    "ID-400"

    "STATUE"

==================================================


In both the scenarios, I need the first record, "FOO BAR" to be extracted using scala code. REPL solutions are more welcome.

Answer

You could start with a simple capture of everything until the 2nd record.

scala> val firstRec = io.Source.fromFile("records.txt").getLines.takeWhile(_ != "RECORD-2")
firstRec: Iterator[String] = non-empty iterator

From there you can trim the record of unwanted parts (headers, blank lines, whatever).