Chance Chance - 1 month ago 5
Java Question

Java regex don't count all matches in a given file

As the title says, I have written a java code to count all the matches in a given file using regular expressions, when I run the code, the out is different from the matches in the file. It works perfectly if I separated each string in a new line. Here's my code:

This is the method that should count:

private static int countOccurrences(String path, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher;
int count = 0;
try {
BufferedReader br = new BufferedReader(new FileReader(path));
String line;
while ((line = br.readLine()) != null) {
matcher = pattern.matcher(line);
if (matcher.find())
count++;
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}

return count;
}


Here's the code using that method:

String regex = "(00966|\\+966)\\d{9}";
int countNumbers = countOccurrences(fileContainsNumbers, regex);


Here's the file I read from:


Lorem Ipsum is simply dummy +966111111111 text of the printing and
typesetting industry.+966222222222 Lorem Ipsum has been the industry's
standard dummy text ever +966333333333 since the 1500s, when an
unknown printer took a galley of type and scrambled +966444444444
+96645789541063 it to make a type specimen book. +966569874514 It has survived not only five centuries, but also the leap into electronic
typesetting, remaining +966569874514 essentially unchanged. It was
popularised +966569874514 in the 1960s with the release of Letraset
sheets containing Lorem Ipsum passages, and more recently with desktop
publishing software like Aldus PageMaker +966555555555 including
versions of Lorem Ipsum.

Answer

You are using the find() method in the wrong way. Instead of

if (matcher.find())
  count++;

you should do

while (matcher.find())
  count++;

You see, you can have multiple matches per line. So that method can return true several times for any line that contains those multiple matches. But you stop counting after the first true!

In other words: if you want to count all matches on each line, then don't stop counting after the first match on a line!