Saad Saad - 1 month ago 12
Java Question

string processing regex

I am writing a program where it will read an input from a file and build a sentence from the words. I am inspecting each word to check if the word ends with one of the sentence terminators which are:


  • period (.)

  • exclamation mark (!)

  • and question mark (?)



to decide if I should be creating a new instance of my sentence object.
This is what I came up with so far

ArrayList<Sentence2> sentences = new ArrayList<>();
String wordsJoin = "";
int numOfWords = 0;
try{
input = new BufferedReader(new FileReader("final.txt"));
strLine = input.readLine();
while(strLine != null){
String[] tokens = strLine.split("\\s+");
for (int i = 0; i < tokens.length; i++){
String s = tokens[i];
if(s.charAt(s.length()-1) != '.' ||s.charAt(s.length()-1) !='?' ||s.charAt(s.length()-1) != '!'){
wordsJoin += tokens[i] + " ";
numOfWords += tokens.length;
}else{
sentences.add(new Sentence2(wordsJoin,numOfWords));


}
}
strLine = input.readLine();
}


The problem is I am getting out of bounds exception. The stack trace is here:


Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
!at java.lang.String.charAt(String.java:658)


Long story short my program is reading input and deciding whether the last character in the word ends in sentence terminator, if it does then i'll create an instance of sentence class which consists of the sentence and the number of words contained in that sentence.

Some of the text from the file that I need to process is here:

Text that follows is based on the Wikipedia page on cryptography!
Cryptography is the practice and study of hiding information. In modern times, cryptography is considered to be a branch of both mathematics and computer science and is affiliated closely with information theory, computer security, and engineering. Cryptography is used in applications present in technologically advanced societies; examples include the security of ATM cards, computer passwords, and electronic commerce, which all depend on cryptography.

I really need help with this please, i have been going over it from quite some time now.

Answer

Your regex is wrong. To split a String to get every word, you should use split("\\s+").

public class Main {
    public static void main(String... args) {
        ArrayList<Sentence2> sentences = new ArrayList<>();
        String wordsJoin = "";
        int numOfWords = 0;

        String strLine = "It will be splitted? Sentence by sentence? Sure!";

        String[] tokens = strLine.split("\\s+");
        for (int i = 0; i < tokens.length; i++) {
            if(strLine.isEmpty()){
                continue;
            }

            String s = tokens[i];
            wordsJoin += tokens[i] + " ";
            numOfWords += tokens.length;

            if (s.charAt(s.length() - 1) == '.' || s.charAt(s.length() - 1) == '?' || s.charAt(s.length() - 1) == '!') {
                sentences.add(new Sentence2(wordsJoin, numOfWords));
                wordsJoin = "";
                numOfWords = 0;
            }
        }

        for (Sentence2 sentence2 : sentences) {
            System.out.println(sentence2.wordsJoin + " " + sentence2.numOfWords);
        }
    }

    public static class Sentence2 {
        private String wordsJoin;
        private int numOfWords;

        public Sentence2(String wordsJoin, int numOfWords) {
            this.wordsJoin = wordsJoin;
            this.numOfWords = numOfWords;
        }
    }
}
Comments