suziz suziz - 1 month ago 5
Java Question

Can't get my regex to work correctly

I got some data that is separated with a comma sign but I need comma signs that is in between "" not to split the data.

So: "A,B" should be "A,B" while A,B should be split in "A", "B".

The trouble I'm having is that if there are several comma signs in row, then the empty spots is ignored: A,,B splits in "A", "B"
But I need it to be: "A", "", "B"

This is my code:

ArrayList<String> tokens = new ArrayList<String>();
String regex = "\"([^\"]*)\"|([^,]+)";
Matcher m = Pattern.compile(regex).matcher(line);
while (m.find()) {
if (m.group(1) != null) {
tokens.add(m.group(1));
}
else {
tokens.add(m.group(2));
}
}


The first group works but I can't get the second one to work as I need to: ([^,]+) (anything except , one or several times)
To also inlude nothing as an empty string. Is that even possible?

Answer

You just need to add another branch to your alternation: (?<=,)(?=,) to match an empty space between two commas.

String line = "A,,B";
ArrayList<String> tokens = new ArrayList<String>();
String regex = "\"([^\"]*)\"|[^,]+|(?<=,)(?=,)";   // <= No need for Group 2
Matcher m = Pattern.compile(regex).matcher(line);
while (m.find()) {
    if (m.group(1) != null) {
        tokens.add(m.group(1));
    } 
    else {
        tokens.add(m.group(0)); // <= Note that we can grab the whole match here
    }
}
System.out.println(tokens); 

See the online Java demo