user3350744 user3350744 - 1 year ago 95
Scala Question

How to modify regular expression for matching beginning and end of a line zero or 1 time?

I want to create a regex with 3 capturing groups to extract information for the 4 possible case below:

val line1 = ""
val line2 = "#"
val line3 = " # Comment"
val line4 = "# # Comment"

val m = lineRegex.findFirstMatchIn(line2).get

line2.substring(m.start(1), m.end(1)) // Should be # or ""
line2.substring(m.start(2), m.end(2)) // Should be
line2.substring(m.start(3), m.end(3)) // Should be # Comment or ""

I came up with:

val lineRegex = """(^#?).*(?:127\.0\.0\.1)\s+((?!-)[A-Za-z0-9-\.]{1,63}(?<!-)\.+[A-Za-z]{2,6})\s+(#?.*)""".r

But it does not match line1 or line2. What do I need to change to make it work for all 4 possible cases?

Answer Source

You need to make the last group optional (i.e. \s+(#?.*) -> (?:\s+(#?.*))?) or just use a * quantifier with the last \s:

val lineRegex = """(^#?).*(?:127\.0\.0\.1)\s+((?!-)[A-Za-z0-9-\.]{1,63}(?<!-)\.+[A-Za-z]{2,6})\s*(#?.*)""".r

See the regex demo and a Scala demo.

A version with the optional group requires a null check for Group 3 (since it may come uninitialized) (demo):

val lineRegex = """(^#?).*(?:127\.0\.0\.1)\s+((?!-)[A-Za-z0-9-\.]{1,63}(?<!-)\.+[A-Za-z]{2,6})(?:\s+(#?.*))?""".r
//                                                                                            ^^^^^^^^^^^^^^
val m = lineRegex.findFirstMatchIn(line2).get
if ( != null) println( 

NOTE: You may print/use directly, no need to get the substring.