Betafish Betafish - 1 year ago 156
Java Question

Programatically training NER Model using .prop file

I have been to train my ner model using a propety file as shown in the tutorial here LINK. I am using the same prop file but, when I fail to understand as to how to do it programatically.

props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("ner.model", "resources/NER.prop");

the prop file is as shown below :

# location of the training file
trainFile = nerTEST.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = resources/ner-model.ser.gz

# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1

# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.

# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
# the last 4 properties deal with word shape features

Error : invalid stream header: 23206C6F
Caused by: Couldn't load classifier from resources/NER.prop

From another question on SO, I understand you provide the model file directly. But, how can we do that with the help of a property file?

Answer Source

You should run this command from the command line:

java -cp "*" -prop NER.prop

If you want to run this in Java code, you could do something like this:

String[] args = new String[]{"-props", "NER.prop"};

The .prop file is a file specifying the settings for training your model. Your code is attempting to load the .prop file as a model itself, which is causing the error.

Doing either will generate the final model at resources/ner-model.ser.gz