I have files I'm feeding to coreNLP's sentiment tagger. I have already broken the files up into individual sentences and thus want to return one tag per file. How can I make the java command return one tag.
The command looks like this
java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
Annotation pipeline timing information:
TokenizerAnnotator: 0.0 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.0 sec. for 8 tokens at 296.3 tokens/sec.
Pipeline setup: 0.0 sec.
Total time for StanfordCoreNLP pipeline: 8.7 sec.
C:\stanford-corenlp-full-2015-04-20>java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
Adding annotator sentiment
Reading in text from stdin.
Please enter one sentence per line.
Processing will end when EOF is reached.
Computer is fun. Not too fun.
Computer is fun Not too fun.
It looks like there was a bug in
SentimentPipeline as it shouldn't split sentences within a line when you use the
-stdin option. I fixed that now but unless you compile your own version, this won't help you until we release the next version of CoreNLP.
But there is also an alternative (and presumably better) way to get sentiment labels for sentences using a CoreNLP pipeline.
The following command runs the same code as your command but at the same time it allows you to specify more options (including the
-ssplit.eolonly option) for the individual annotators.
java -cp "*" -mx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,parse,sentiment" -ssplit.eolonly