Croutonix Croutonix - 26 days ago 6
Java Question

Match Lua multiline strings and comments with Regex

I have a Lua editor in which I implemented syntax highlighting. I use regexes to match expressions like strings, comments, tokens, numbers, etc of Lua. The whole thing is made in Java and uses Java regexes. I had trouble with two things:

Multiline strings - Lua multiline brackets start and end with double square brackets

[[
Everything between is the string, there can even be nested multiline strings. You can see what I made here, the regex is
\[\[((?>[^\[\[\]\]]|(?R))*\]\])
and it works. It's similar to what you can see on this page under the match balanced constructs section. It finds expressions with equal amounts of
[[
and
]]
The thing is, recursion is not supported by Java regex engine. How can I replace it with something supported?

Multiline comments - Lua multiline comments start with
--[====[
and end with
]====]
. It ends only if there is as much equal signs as the opening bracket. There can be anywhere between 0 and infinite equal signs. I made this regex
--\[\[((.|\n)*?)\]\]
but it only works for the
--[[ comment ]]
pattern and do not support this
--[==[ comment ]==]
. Maybe I could do something like counting number of matches of equal signs at the opening then match the same the number for the closing tag. Is this possible in java regex? How?

Answer

Try this

--\[(=*)\[(.|\n)*?\]\1\]

Multiline string literals are absolutely the same but without leading --:

\[((=*)\[(.|\n)*?)\]\2\]
Comments