zeroworld zeroworld - 1 month ago 7
Java Question

java lookbehind for split by greedy quantifiers expressions

I wrote the following expression to split a string after every x word (3 for instance) followed by a space. My problem is that I need to keep the entire content. But I cannot find a way to use look behind etc to accomplish this in Java.

Anyone has experience with that?

String text = "Hello my name is Tom and i love playing football";
String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + ngramm_length + "}";
System.out.println(regex);
String[] ngramms = text.split(regex);


result are 4 tokens but only the last one still contains the content, I would like to get:


1: Hello my name 2: is Tom and 3: i love playing 4: football



Look into the match information box in the link JAVA Code:

public static void main(String[] args) throws IOException {
int length = 3; //2
String dynamic_length = "";
for (int i = 1; i < length; i++) {
dynamic_length += i;

if (i + 1 < length) {
dynamic_length += ",";
}
}

final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + length + "}|([a-zA-Z0-9öÖäÄüÜß]+\\s){" + dynamic_length + "}";
final String string = "Hello my name is Tom and i love playing football\n\n";

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
int count = 0;
while (matcher.find()) {
++count;
System.out.println("match:" + count + " " + matcher.group(0));
}


it is not dynamic because it is only working for length of 2 and 3. That's my problem with it or do I miss something?

for x > 1 i can use:

final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + length + "}|([a-zA-Z0-9öÖäÄüÜß]+\\s){1," + (length - 1) + "}";


for x = 1 i can use:

final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + length + "}|([a-zA-Z0-9öÖäÄüÜß]+\\s){1}";

Answer

You can try this:

([a-zA-Z0-9öÖäÄüÜß]+\s){3}|([a-zA-Z0-9öÖäÄüÜß]+\s){1,2}

Explanation

Look into the match information box in the link JAVA Code:

public static void main(String[] args) {
    final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){3}|([a-zA-Z0-9öÖäÄüÜß]+\\s){1,2}";
    final String string = "Hello my name is Tom and i love playing football\n\n";

    final Pattern pattern = Pattern.compile(regex);
    final Matcher matcher = pattern.matcher(string);
    int count = 0;
    while (matcher.find()) {
        ++count;
        System.out.println("match:" + count + " " + matcher.group(0));
    }

As per your comment:

if you want n block per match then you do it, make sure n>0

([a-zA-Z0-9öÖäÄüÜß]+\s){n}|([a-zA-Z0-9öÖäÄüÜß]+\s){1,n-1}


Sample output

    match:1 Hello my name 
    match:2 is Tom and 
    match:3 i love playing 
    match:4 football