dreddy dreddy - 1 month ago 11
Scala Question

scala regex to match tab separated words from a string

I'm trying to match the following string

"name type this is a comment"


Name and type are definitely there.
Comment may or may not exist.
I'm trying to store this into variables n,t and c.

val nameTypeComment = """^(\w+\s+){2}(?:[\w+\s*)*\(\,\,]+)"""
str match { case nameType(n, t, c) => print(n,t,c) }


This is what I have but doesn't seem to be working. Any help is appreciated.

val nameType = """^(\w+)\s+([\w\)\(\,]+)""".r


However this works when i was trying to work with strings only with name and type and no comment which is a group of words which might or not be there.

Answer

Note that ^(\w+\s+){2}(?:[\w+\s*)*\(\,\,]+) regex only contains 1 capturing group ((\w+\s+)) while you define 3 in the match block.

The ^(\w+)\s+([\w\)\(\,]+) only contains 2 capturing groups: (\w+) and ([\w\)\(\,]+).

To make your code work, you need to define 3 capturing groups. Also, it is not clear what the separators are, let me assume the first two fields are just 1 or more alphanumeric/underscore symbols separated by 1 or more whitespaces. The comment is anything after 2 first fields.

Then, use

val s = "name     type       this comment a comment"
val nameType    = """(\w+)\s+(\w+)\s+(.*)""".r
val res = s match { 
    case nameType(n, t, c) => print(n,t,c) 
    case _ => print("NONE")
}

See the online demo

Note that we need to compile a regex object, pay attention at the .r after the regex pattern nameType.

Note that a pattern inside match is anchored by default, the start of string anchor ^ can be omitted.

Also, it is a good idea to add case _ to define the behavior when no match is found.