user3350744 user3350744 - 3 months ago 8
Scala Question

How to modify regular expression for matching beginning and end of a line zero or 1 time?

I want to create a regex with 3 capturing groups to extract information for the 4 possible case below:

val line1 = "127.0.0.1 ssl.google-analytics.com"
val line2 = "#127.0.0.1 ssl.google-analytics.com"
val line3 = "127.0.0.1 ssl.google-analytics.com # Comment"
val line4 = "#127.0.0.1 ssl.google-analytics.com # Comment"

val m = lineRegex.findFirstMatchIn(line2).get

line2.substring(m.start(1), m.end(1)) // Should be # or ""
line2.substring(m.start(2), m.end(2)) // Should be ssl.google-analytics.com
line2.substring(m.start(3), m.end(3)) // Should be # Comment or ""


I came up with:

val lineRegex = """(^#?).*(?:127\.0\.0\.1)\s+((?!-)[A-Za-z0-9-\.]{1,63}(?<!-)\.+[A-Za-z]{2,6})\s+(#?.*)""".r


But it does not match line1 or line2. What do I need to change to make it work for all 4 possible cases?

Answer

You need to make the last group optional (i.e. \s+(#?.*) -> (?:\s+(#?.*))?) or just use a * quantifier with the last \s:

val lineRegex = """(^#?).*(?:127\.0\.0\.1)\s+((?!-)[A-Za-z0-9-\.]{1,63}(?<!-)\.+[A-Za-z]{2,6})\s*(#?.*)""".r
                                                                                               ^^

See the regex demo and a Scala demo.

A version with the optional group requires a null check for Group 3 (since it may come uninitialized) (demo):

val lineRegex = """(^#?).*(?:127\.0\.0\.1)\s+((?!-)[A-Za-z0-9-\.]{1,63}(?<!-)\.+[A-Za-z]{2,6})(?:\s+(#?.*))?""".r
//                                                                                            ^^^^^^^^^^^^^^
val m = lineRegex.findFirstMatchIn(line2).get
//...
if (m.group(3) != null) println(m.group(3)) 

NOTE: You may print/use m.group(N) directly, no need to get the substring.

Comments