StepTNT StepTNT - 2 months ago 15
Java Question

"TokenStream contract violation: close() call missing" when calling addDocument

I'm using Lucene's features to build a simple way to match similar words within a text.

My idea is to have have an

Analyzer
running on my text to provide a
TokenStream
, and for each token I run a
FuzzyQuery
to see if I have a match in my index. If not I just index a new
Document
containing just the new unique word.

Here's what I'm getting tho:

Exception in thread "main" java.lang.IllegalStateException: TokenStream contract violation: close() call missing
at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:411)
at org.apache.lucene.analysis.standard.StandardAnalyzer$1.setReader(StandardAnalyzer.java:111)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:165)
at org.apache.lucene.document.Field.tokenStream(Field.java:568)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:708)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1562)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1307)
at org.myPackage.MyClass.addToIndex(MyClass.java:58)


Relevant code here:

// Setup tokenStream based on StandardAnalyzer
TokenStream tokenStream = analyzer.tokenStream(TEXT_FIELD_NAME, new StringReader(input));
tokenStream = new StopFilter(tokenStream, EnglishAnalyzer.getDefaultStopSet());
tokenStream = new ShingleFilter(tokenStream, 3);
tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
...
// Iterate and process each token from the stream
while (tokenStream.incrementToken()) {
CharTermAttribute charTerm = tokenStream.getAttribute(CharTermAttribute.class);
processWord(charTerm.toString());
}
...
// Processing a word means looking for a similar one inside the index and, if not found, adding this one to the index
void processWord(String word) {
...
if (DirectoryReader.indexExists(index)) {
reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs searchResults = searcher.search(query, 1);
if (searchResults.totalHits > 0) {
Document foundDocument = searcher.doc(searchResults.scoreDocs[0].doc);
super.processWord(foundDocument.get(TEXT_FIELD_NAME));
} else {
addToIndex(word);
}
} else {
addToIndex(word);
}
...
}
...
// Create a new Document to index the provided word
void addWordToIndex(String word) throws IOException {
Document newDocument = new Document();
newDocument.add(new TextField(TEXT_FIELD_NAME, new StringReader(word)));
indexWriter.addDocument(newDocument);
indexWriter.commit();
}


The exception seems to tell that I should close the
TokenStream
before adding things to the index, but this doesn't really make sense to me because how are index and
TokenStream
related? I mean, index just receives a
Document
containing a
String
, having the
String
coming from a
TokenStream
should be irrelevant.

Any hint on how to solve this?

Answer

The problem is in your reuse of the same analyzer that the IndexWriter is trying to use. You have a TokenStream open from that analyzer, and then you try to index a document. That document needs to be analyzed, but the analyzer finds it's old TokenStream is still open, and throws an exception.

To fix it, you could create a new, separate analyzer for processing and testing the string, instead of using the one that IndexWriter is using.