ali ali - 1 month ago 8
Java Question

How to get all terms for a Lucene field in Lucene 4

I'm trying to update my code from Lucene 3.4 to 4.1. I figured out the changes except one. I have code which needs to iterate over all term values for one field. In Lucene 3.1 there was an IndexReader#terms() method providing a TermEnum, which I could iterate over. This seems to have changed for Lucene 4.1 and even after several hours of search in the documentation I am not able to figure out how. Can someone please point me in the right direction?

Thanks.

For all who whant the direct answer. This is the relevant part from the migration guide:


How you obtain the enums has changed. The primary entry point is the
Fields
class. If you know your reader is a single segment reader, do this:

Fields fields = reader.Fields();
if (fields != null) {
...
}


If the reader might be multi-segment, you must do this:

Fields fields = MultiFields.getFields(reader);
if (fields != null) {
...
}


The
fields
may be
null
(eg if the reader has no fields).

Note that the
MultiFields
approach entails a performance hit on
MultiReaders
, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use
oal.util.ReaderUtil
) and then step through those readers yourself, if you can (this is how Lucene drives searches).

If you pass a
SegmentReader
to
MultiFields.fields
it will simply return
reader.fields()
, so there is no performance hit in that case.

Once you have a non-null Fields you can do this:

Terms terms = fields.terms("field");
if (terms != null) {
...
}


The
terms
may be
null
(eg if the field does not exist).

Once you have a non-
null
terms you can get an enum like this:

TermsEnum termsEnum = terms.iterator();


The returned
TermsEnum
will not be null.

You can then
.next()
through the
TermsEnum


Answer

Please follow Lucene 4 Migration guide::

How you obtain the enums has changed. The primary entry point is the Fields class. If you know your reader is a single segment reader, do this:

Fields fields = reader.Fields();
if (fields != null) {
  ...
}

If the reader might be multi-segment, you must do this:

Fields fields = MultiFields.getFields(reader);
if (fields != null) {
  ...
}

The fields may be null (eg if the reader has no fields).

Note that the MultiFields approach entails a performance hit on MultiReaders, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use oal.util.ReaderUtil) and then step through those readers yourself, if you can (this is how Lucene drives searches).

If you pass a SegmentReader to MultiFields.fields it will simply return reader.fields(), so there is no performance hit in that case.

Once you have a non-null Fields you can do this:

Terms terms = fields.terms("field");
if (terms != null) {
  ...
}

The terms may be null (eg if the field does not exist).

Once you have a non-null terms you can get an enum like this:

TermsEnum termsEnum = terms.iterator();

The returned TermsEnum will not be null.

You can then .next() through the TermsEnum

Comments