Saprativa Bhattacharjee Saprativa Bhattacharjee - 4 months ago 30
Java Question

OpenNLP Sentence Detection API for entire text file

Here is the code for OpenNLP Sentence Detector API for a single String:

package opennlp;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;

import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;

public class SentenceDetector {

public static void main(String[] args) throws FileNotFoundException {
InputStream modelIn = new FileInputStream("en-sent.zip");
SentenceModel model = null;
try {
model = new SentenceModel(modelIn);
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);
String sentences[] = sentenceDetector.sentDetect(" First sentence. Second sentence.");

for(String str : sentences)
System.out.println(str);
}
}


Now my question is how do I pass an entire text file and perform sentence detection instead of a single string?

Answer

Simple way: to read whole file as string and pas it in usual way. Following method reads file content as string:

public String readFileToString(String pathToFile) throws Exception{
    StringBuilder strFile = new StringBuilder();
    BufferedReader reader = new BufferedReader(new FileReader(pathToFile));
    char[] buffer = new char[512];
    int num = 0;
    while((num = reader.read(buffer)) != -1){
        String current = String.valueOf(buffer, 0, num);
        strFile.append(current);
        buffer = new char[512];
    }
    reader.close();
    return strFile.toString();
}
Comments