Vipul Behl Vipul Behl - 2 months ago 8
Java Question

How to use regular expression while searching in HashSet

I am writing a Java program in which I need to search a particular word from a Set. The word that has to be searched is something like ("wo.d") where '.' can be replaced by any other alphabet. I am using regex to match such type of word cases.

This is what I have so far

HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for(String setWord : words){
m = p.matcher(setWord);
if(m.matches())
match = true;
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else{
System.out.println("The word does not contain regex do other stuff");
}


The code above works but is not efficient because it is being called many times in a second. So it produces a lag in the program.

Answer

You need to stop iterating as soon as you get a match, so assuming that you use Java 8, your for loop could be rewritten efficiently as next:

boolean match = words.stream().anyMatch(w -> p.matcher(w).matches());

You could also parallelize the research using parallelStream() instead of stream() especially if your Set has a lot of words.

If you don't use Java 7, it could still be done using FluentIterable from Google Guava but without the ability to parallelize the research unfortunately.

boolean match = FluentIterable.from(words).anyMatch(
    new Predicate<String>() {
        @Override
        public boolean apply(@Nullable final String w) {
            return p.matcher(w).matches();
        }
    }
);

But in your case, I don't believe that using FluentIterable can be more interesting than simply adding a break when you get a match, as it will still be easier to read and maintain

if (p.matcher(setWord).matches()) {
    match = true;
    break;
}

So, if you really need to use a regular expression and you cannot use Java 8, your best option is to use break as described above, there is no magic trick to consider.


Assuming that you will only have one character to replace, it could be done using startsWith(String) and endsWith(String) which will always be much faster than a regular expression. Something like this:

// Your words should be in a TreeSet to be already sorted alphabetically 
// in order to get a match as fast as possible
Set<String> words = new TreeSet<String>(); //this set is already populated
int index = word.indexOf('.');
if (index != -1) {
    String prefix = word.substring(0, index);
    String suffix = word.substring(index + 1);
    boolean match = false;
    for (String setWord : words){
        // From the fastest to the slowest thing to check 
        // to get the best possible performances
        if (setWord.length() == word.length() 
            && setWord.startsWith(prefix) 
            && setWord.endsWith(suffix)) {
            match = true;
            break;
        }
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else {
    System.out.println("The word does not contain regex do other stuff");
}
Comments