Tyler Rinker Tyler Rinker - 2 months ago 23
Java Question

Stanford coreNLP sentiment without splitting sentences

I have files I'm feeding to coreNLP's sentiment tagger. I have already broken the files up into individual sentences and thus want to return one tag per file. How can I make the java command return one tag.

The command looks like this

java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
and outputs as follows:

Annotation pipeline timing information:
TokenizerAnnotator: 0.0 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.0 sec. for 8 tokens at 296.3 tokens/sec.
Pipeline setup: 0.0 sec.
Total time for StanfordCoreNLP pipeline: 8.7 sec.

C:\stanford-corenlp-full-2015-04-20>java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
Adding annotator sentiment
Reading in text from stdin.
Please enter one sentence per line.
Processing will end when EOF is reached.

Computer is fun. Not too fun.
Positive
Neutral


How could I make the output a single tag similar to what I did below by removing the punctuation:

Computer is fun Not too fun.
Positive


It seems I should be able to do this easily since there is the
-ssplit.isOneSentence
and to my understanding the sentiment tagger uses
ssplit
but I don't know how to rework my command to incorporate it (I have read command line documentation).

Answer

It looks like there was a bug in SentimentPipeline as it shouldn't split sentences within a line when you use the -stdin option. I fixed that now but unless you compile your own version, this won't help you until we release the next version of CoreNLP.

But there is also an alternative (and presumably better) way to get sentiment labels for sentences using a CoreNLP pipeline.

The following command runs the same code as your command but at the same time it allows you to specify more options (including the -ssplit.eolonly option) for the individual annotators.

java -cp "*" -mx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,parse,sentiment" -ssplit.eolonly