burzakovskiy burzakovskiy - 2 years ago 272
Java Question

How to calculate syllables in text with regex and Java

I have text as a

and need to calculate number of syllables in each word. I've tried to split all text into array of words and than processed each word separately. I used regular expressions for that. But pattern for syllables doesn't work as it should. Please advice how to change it to calculate correct number of syllables. My initial code.

public int getNumSyllables()
String[] words = getText().toLowerCase().split("[a-zA-Z]+");
int count=0;
List <String> tokens = new ArrayList<String>();
for(String word: words){
tokens = Arrays.asList(word.split("[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*"));
count+= tokens.size();

return count;

Answer Source

This question is from a Java Course of UCSD, am I right?

I think you should provide enough information for this question, so that it won't confused people who want to offer some help. And here I have my own solution, which already been tested by the test case from the local program, also the OJ from UCSD.

You missed some important information about the definition of syllable in this question. Actually I think the key point of this problem is how should you deal with the e. For example, let's say there is a combination of te. And if you put te in the middle of a word, of course it should be counted as a syllable; However if it's at the end of a word, the e should be thought as a silent e in English, so it should not be thought as a syllable.

That's it. And I would like to write down my thought with some pseudo code:

  if(last character is e) {
        if(it is silent e at the end of this word) {
           remove the  silent e;
           count the rest part as regular;
        } else {
  } else {
        count it as regular;

You may find that I am not only using regex to deal with this problem. Actually I have thought about it: can this question really be done only using regex? My answer is: nope, I don't think so. At least now, with the knowledge UCSD gives us, it's too difficult to do that. Regex is a powerful tool, it can map the desired characters very fast. However regex is missing some functionality. Take the te as example again, regex won't be able to think twice when it is facing the word like teate (I made up this word just for example). If our regex pattern would count the first te as syllable, then why the last te not?

Meanwhile, UCSD actually have talked about it on the assignment paper:

If you find yourself doing mental gymnastics to come up with a single regex to count syllables directly, that's usually an indication that there's a simpler solution (hint: consider a loop over characters--see the next hint below). Just because a piece of code (e.g. a regex) is shorter does not mean it is always better.

The hint here is that, you should think this problem together with some loop, combining with regex.

OK, I should finally show my code now:

protected int countSyllables(String word)
    // TODO: Implement this method so that you can call it from the 
    // getNumSyllables method in BasicDocument (module 1) and 
    // EfficientDocument (module 2).
    int count = 0;
    word = word.toLowerCase();

    if (word.charAt(word.length()-1) == 'e') {
        if (silente(word)){
            String newword = word.substring(0, word.length()-1);
            count = count + countit(newword);
        } else {
    } else {
        count = count + countit(word);
    return count;

private int countit(String word) {
    int count = 0;
    Pattern splitter = Pattern.compile("[^aeiouy]*[aeiouy]+");
    Matcher m = splitter.matcher(word);

    while (m.find()) {
    return count;

private boolean silente(String word) {
    word = word.substring(0, word.length()-1);

    Pattern yup = Pattern.compile("[aeiouy]");
    Matcher m = yup.matcher(word);

    if (m.find()) {
        return true;
    } else
        return false;

You may find that besides from the given method countSyllables, I also create two additional methods countit and silente. countit is for counting the syllables inside the word, silente is trying to figure it out that is this word end with a silent e. And it should also be noticed that the definition of not silent e. For example, the should be consider not silent e, while ate is considered silent e.

And here is the status my code has already passed the test, from both local test case and OJ from UCSD:

from local test case

And from OJ the test result:

from Coursera OJ

P.S: It should be fine to use something like [^aeiouy] directly, because the word is parsed before we call this method. Also change to lowercase is necessary, that would save a lot of work dealing with the uppercase. What we want is only the number of syllables. Talking about number, an elegant way is to define count as static, so the private method could directly use count++ inside. But now it's fine.

Feel free to contact me if you still don't get the method of this question :)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download