Chetan Kothari Chetan Kothari - 1 month ago 11
Java Question

Escaping regex in capture group

I have to extract a part of the string

say the string is

"this is a string "xyz" "


what i want to extract from here is the string

xyz


But the problem is i have 2 variants of the same string.

"this is a string "xyz" "
"this is a string - "


i want to extract

xyz or -


i've tried a extractor

".*((?:")[^"]*(?:")|-).*".r


this extracts - well and it extracts the string as well but i does not exclude the quotes.
the result i get when i give the 2 string as mentioned above are as follows

"xyz" instead of xyz
- as expected


Thanks in advance.

Answer

Use look-around, i.e. replace:

(?:")[^"]*(?:")

with

(?<=")[^"]*(?=")

(?<=") is positive look-behind, checking that the previous character is a ", but doesn't include it in the match.
(?=") is positive look-ahead, checking that the next character is a ", but doesn't include it in the match.

If you're searching for this inside a larger string, you may also want to replace .* with .*?. .*? will match as few characters as possible, where .* will match as many as possible. As an example, given abbbaabbba:

a.*a  finds abbbaabbba as one match
a.*?a finds abbba and abbba separately

These changes introduces a new problem though, as the look-around can try to match the outer braces, whereas a normal match couldn't. If you replace the .*'s with .+, it should prevent this problem (assuming this is valid for the - match, the quotes match should be the same because of using look-around).

Final regex:

".+((?<=")[^"]*(?=")|-).+"

I'm not sure what the .r was for.

Test.