Kong Kong - 3 months ago 29
Java Question

Stanford CoreNLP: input with one sentence per line

I'm using the Stanford NLP tool for college work. This parser ends the sentences at every point (period) but I need also to close in each line, that is, in each character ' \ n' . by command line, you can use the option " -sentences " but so far there is not a similar command for code .

The option setOptionFlags from LexicalizedParser did not work either

Answer

Here is some sample code to elaborate on Gabor's answer:

import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.charset.StandardCharsets;

import java.io.*;
import java.util.*;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.charset.StandardCharsets;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;
import edu.stanford.nlp.util.*;

public class ParserExample {

    public static void main (String[] args) throws IOException {
        String text = new String(Files.readAllBytes(Paths.get(args[0])), StandardCharsets.UTF_8);
        Annotation document = new Annotation(text);
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
        props.setProperty("ssplit.newlineIsSentenceBreak", "always");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        pipeline.annotate(document);
    }

}

args[0] should be the path to your file with one sentence per line

You will need to download Stanford CoreNLP 3.5.2 from this link and put the jars from the download in your classpath: http://nlp.stanford.edu/software/corenlp.shtml

You can set other options for the parser with props.setProperty()

If you have a file with one sentence per line, you can use

props.setProperty("ssplit.eolonly", "true");

if you only want to split on newlines.