Chaklader Chaklader - 1 month ago 6
Java Question

How to correct the regex to find exact word match without being case sensitive?

I have a private method that I'm testing and provided below,

private boolean containsExactDrugName(String testString, String drugName) {

Matcher m = Pattern.compile("\\b(?:" + drugName + ")\\b|\\S+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(testString);
ArrayList<String> results = new ArrayList<>();

while (m.find()) {
results.add(m.group());
}

boolean found = results.contains(drugName);
return found;
}


I take a text
String
and medication name provided inside the method and returns
boolean
. I need it to be case
insensitive
and the last
assertion
of the test is failing. The test is provided below,

@Test
public void test_getRiskFactors_givenTextWith_Orlistat_Should_Not_Find_Medication() throws Exception {

String drugName = "Orlistat";
assertEquals("With Orlistat", true, containsExactDrugName("The patient is currently being treated with Orlistat", drugName));
assertEquals("With Orlistattesee", false, containsExactDrugName("The patient is currently being treated with Orlistattesee", drugName));
assertEquals("With abcOrlistat", false, containsExactDrugName("The patient is currently being treated with abcOrlistat", drugName));
assertEquals("With orlistat", true, containsExactDrugName("The patient is currently being treated with orlistat", drugName));
}


In the last assertion the drug name is in lower case
orlistat
but still needs to match with the provided parameter
Orlistat
. I used
Pattern.CASE_INSENSITIVE
, however its not working. How to write the code properly ?

Answer

The problem isn't mainly in your regular expression, it's the containsExactDrugName method itself. You're doing case-insensitive matching to find the drugName within the larger string, but then you look for an exact match of the drugName within the resulting list of matched strings:

results.contains(drugName)

This check is not only redundant (since the regex already did the work of finding the matches), it's actively breaking your function, because once again you're checking for an exact, case-sensitive match. Simply get rid of that:

private boolean containsExactDrugName(String testString, String drugName) {

    Matcher m = Pattern.compile("\\b(?:" + drugName + ")\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(testString);
    List<String> results = new ArrayList<>();

    while (m.find()) {
        results.add(m.group());
    }

    return !results.isEmpty();
}

Actually, since you're not keeping track of the number of times you've found drugName, the entire list is pointless, and you can simplify your method to:

private boolean containsExactDrugName(String testString, String drugName) {

    Matcher m = Pattern.compile("\\b(?:" + drugName + ")\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(testString);

    return m.find();
}

Edit - Your regex is also too permissive. It's matching on \\S+, which means any sequence of 1 or more non-space characters. I'm not sure why you included that, but it's causing your regex to match things that are not the drugName. Remove the |\\S+ section of the expression.

Comments