Cyborgz Cyborgz - 5 months ago 11
Java Question

Java Reg Exp for Word not followed by another word

Basically, I am writing a program in java where I have to categorize a String in any of three buckets.


  • Category 1 - String with both 'AND' and 'AND NOT'

  • Category 2 - String with 'AND NOT'

  • Category 3 - String with 'AND'



I need some regex to match string having AND followed by NOT if not skip.

A AND B AND NOT C - Fail
A AND B AND C - Fail
A AND NOT B AND NOT C - Pass


Below is sample code snippet

public static void main(String[] args) {
String X = "A AND B AND C AND D AND NOT E";
String Y = "A AND NOT C ";
String Z = "A AND B AND D";
ArrayList<String> sampleString=new ArrayList<String>(Arrays.asList(X,Y,Z));

//Category 1 - String with both 'AND' and 'AND NOT'
//Category 2 - String with 'AND NOT' only
//Category 3 - String with 'AND' only

for(String s:sampleString){
if(s.contains("AND") && s.contains("NOT")){
System.out.println("Category 1 -"+s);
}
// This condition is invalid - I need some regex to match this condition. I need to consider only AND followed by NOT if not skip

if(s.contains("AND NOT") && !s.contains("AND")){
System.out.println("Category 2 - "+s);
}
if(s.contains("AND") && !s.contains("NOT")){
System.out.println("Category 3 - "+s);
}
}


OUTPUT -

Category 1 -A AND B AND C AND D AND NOT E
Category 1 -A AND NOT C
Category 3 - A AND B AND D


I tried some regex questions but doesn't resolve mine. I tried with below

String regex="AND(?!\\s+NOT)";

public static void main(String args[]){
String x= "A AND B AND C AND NOT D";
String regex="AND(?!\\s+NOT)";
if(Pattern.compile(regex).matcher(x).find()){
System.out.println("X MATCHED");
}
}
//Returns - X MATCHED


Any help would be much appreciated!

Answer

The following regex find() loop will determine the category, returning 0 if the input didn't match any of the listed categories.

private static int categorize(String input) {
    Matcher m = Pattern.compile("(?i)\\bAND(\\s+NOT)?\\b").matcher(input);
    boolean foundAndNot = false, foundAnd = false;
    while ((! foundAndNot || ! foundAnd) && m.find())
        if (m.start(1) != -1)
            foundAndNot = true;
        else
            foundAnd = true;
    return (foundAndNot ? (foundAnd ? 1 : 3)
                        : (foundAnd ? 2 : 0));
}

The left side of the && condition in the while loop is just a short-circuit, to exit the loop early if both are found.

The (?i) in the regex is for making it case-insensitive, which is where regex outshines any contains() implementation.

The m.start(1) != -1 check is to see if the capture group matched, i.e. to see if the match included the NOT word.

TEST

System.out.println(categorize("A AND B AND NOT C"));     // prints 1
System.out.println(categorize("A AND B AND C"));         // prints 3
System.out.println(categorize("A AND NOT B AND NOT C")); // prints 2
System.out.println(categorize("A OR B OR NOT C"));       // prints 0
Comments