Dark Knight Dark Knight - 2 months ago 12
Java Question

Java Regex: Split based on non-word characters except for apostrophe

I'm trying to split and include based on spaces and non-word characters, except for apostrophes.

I've been able to make it split and include based on spaces and non-word characters, but I can't seem to figure out how to exclude apostrophes from the non-word characters.

This is my current Regex...

str.split("\\s|(?=\\W)");


...which when run on this code sample:

program p;
begin
write('x');
end.


...produces this result:

program
p
;
begin

write
(
'x <!-- This is the problem.
'
)
;
end
.


Which is almost correct, but my goal is to skip the apostrophes so that this is the result:

program
p
;
begin

write
(
'x' <!-- This is the wanted result.
)
;
end
.


UPDATE

As suggested I've tried:

str.split("\\s|(?=\\W)(?<=\\W)");


Which almost works, but does not split all of the special characters correctly:

program
p;
begin
write(
'x'
)
;
end.

vks vks
Answer

You can split on this.

\s|('[^']*')|(?=\W)

See demo.

https://regex101.com/r/mL7eL6/1