PirateApp PirateApp - 3 months ago 12
Java Question

Regular expression, get an array from captured group Java

I have some text that occurs in a particular format as shown below
Each line starts with a + followed by a space and some text
It then has a bunch of lines stuck together that start with a minus sign or @ or % or * and space and some text following it. I would like to capture each block separately from below using Regular expressions.

+ you rock
- I rock and rule.

+ you rule
- I rock and rule.
- That is a perfect artificial entity.

+ you made a mistake
- That is impossible. I never make mistakes.
- I guess so, something must have gone wrong.


Output

Block 1
+ you rock
- I rock and rule.

Block 2
+ you rule
- I rock and rule.
- That is a perfect artificial entity.

This is my current regular expression

(^\+.*$)(?:\r?\n)(?:(^[-%@\*].*$)(?:\r?\n)?)+


In the above expression, Group 1 = (^+.$) that captures the statement following a +, group 2 = (^[-%@*].$) that captures the second part but notice that there may be more than one statement which has a - at the start.

When I run a while loop in Java code

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(contents);
while (matcher.find()) {
// This gives me the item following +
System.out.println(matcher.group(1));
// This ONLY gives me the last item following -, how do I get all
System.out.println(matcher.group(2));
}


How do I get all the statements that have a minus sign in front of them as an array?

Answer

Using this regexp ^\+[^+]* with m and g modifiers gives you needed result

https://regex101.com/r/bH1aQ9/1

On your test data result will be 3 groups start with + character.

The solution idea is to treat all you lines like one big line and split it on groups is started with + and haven't + inside them.