Saad Saad - 1 month ago 11
Java Question

Java Processing input from a file

So I am doing this past sample final exam where the question asks to read input from a file and then process them into words. The end of a sentence is marked by any word that ends with one of the three characters . ? !

I was able to write a code for this however I can only split them into sentences using scanner class and using use.Delimiter. I want to process them into words and see if a word ends in the above sentence separator then I will just stop adding words into the sentence class.
Any help would be appreciated as I am learning this on my own and this is what I came up with. My code is here.

File file = new File("finalq4.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
while(scanner.hasNext()){
sentCount++;
line = scanner.next();
line = line.replaceAll("\\r?\\n", " ");
line = line.trim();
StringTokenizer tokenizer = new StringTokenizer(line, " ");
wordsCount += tokenizer.countTokens();
sentences.add(new Sentence(line,wordsCount));
for(int i = 0; i < line.replaceAll(",|\\s+|'|-","").length(); i++){
currentChar = line.charAt(i);
if (Character.isDigit(currentChar)) {
}else{
lettersCount++;
}
}
}


What I am doing in this code is that I am splitting the input into sentences using the Delimiter method and then counting the words, letters of the entire file and storing the sentences in a sentence class.

If I want to split this into words, how can I do that without using the scanner class.

Some of the input from the file that I have to process is here:


Text that follows is based on the Wikipedia page on cryptography!

Cryptography is the practice and study of hiding information. In modern times,
cryptography is considered to be a branch of both mathematics and computer
science, and is affiliated closely with information theory, computer security, and
engineering. Cryptography is used in applications present in technologically
advanced societies; examples include the security of ATM cards, computer
passwords, and electronic commerce, which all depend on cryptography.....


I can further elaborate on this question if it needs explanation.

What I want to be able to do is to keep adding words to the sentence class and stop if the word ends in one of the above sentence separator. And then read another word and keep adding the words until I hit another separator.

Answer

Okay so i have been solving this question through several techniques and one of the approach was above. however i was able to solve this with another approach as well which does not involve using Scanner class. This one was much more accurate and it gave me the exact output whereas in the above i was off by a few words and letters.

try {
        input = new BufferedReader(new FileReader("file.txt"));
        strLine = input.readLine();
        while(strLine!= null){

            String[] tokens = strLine.split("\\s+");
            for (int i = 0; i < tokens.length; i++) {
                if(strLine.isEmpty()){
                    continue;
                }
                String s = tokens[i];
                wordsJoin += tokens[i] + " ";

                wordCount += i;
                int len = s.length();
                String charString = s.replaceAll("[^a-zA-Z ]", "");
                for(int k =0; k<charString.length(); k++){
                    currentChar = charString.charAt(k);
                    if(Character.isLetter(currentChar)){ 
                        lettersCount++;
                    }  
                }
                if (s.charAt(len - 1) == '.' || s.charAt(len - 1) == '?' || s.charAt(len - 1) == '!') {
                    sentences.add(new Sentence(wordsJoin, wordCount));
                    sentCount++;
                    numOfWords += countWords(wordsJoin);
                    wordsJoin = "";
                    wordCount = 0;
                } 
            }
            strLine = input.readLine();
        }

This might be useful for anyone doing the same problem or just need an idea of how to count letters, words and sentences from a text file.