Wasi Ahmad Wasi Ahmad - 17 days ago 5
Java Question

What is the difference between Field and StringField in Lucene?

I am searching for exactly matched document titles in lucene index. To accomplish that, I have the following two alternative approach to create fields for documents which will be indexed.

Approach 1:

FieldType _contentFieldType = new FieldType();
_contentFieldType.setIndexed(true);
_contentFieldType.setStored(true);

Document doc = new Document();
doc.add(new Field("content", getContent(), _contentFieldType));
writer.addDocument(doc);


Approach 2:

Document doc = new Document();
doc.add(new StringField("content", getContent(), Store.YES));
writer.addDocument(doc);


Then i am creating query using
TermQuery
and searching in the lucene index but i don't get any result if i use the first approach. Second appoach is working fine for me.

Query query = new TermQuery(new Term(searchQuery.fields().get(0), searchQuery.queryText()));
indexSearcher.search(query, Math.max(1, collector.getTotalHits()));


Example of document titles: Document titles are actually topic of the document, a hierarchical path of topics.

Top/Arts/Animation/Audio
Top/Arts/Animation/Collectibles
Top/Arts/Animation/Stop-Motion
Top/Arts/Animation/Festivals
Top/Arts/Animation/News_and_Media
Top/Arts/Animation/Chats_and_Forums
Top/Arts/Animation/Training
Top/Arts/Animation/Voice_Actors
Top/Arts/Animation/Artists


Say, i want to search for
Top/Arts/Animation/Training
. I need exact string matching, so i have used
TermQuery
.

I read documentation and learned about Field and StringField. So, StringField is indexed but not analyzed if
Store.Yes
is passed as parameter. But my question is, since i am using both
setIndexed(true)
and
setStored(true)
for
Field
in approach 1, why i am not getting similar result from approach 1? Is it because some additional things are executed if i used Field or is it because of the use of TermQuery? What is the main thing that is making this two approaches different? Please help me to understand the difference in them.

Thanks!

Answer

So here is what I am thinking is happening.

You used an analyzer for indexing which lower cases input tokens in your first approach.

so e.g Top/Arts/Animation/Training this will be stored as following

top/Arts/animation/training

now when you search it using TERMQUERY, the termquery actually search for the exact string. i.e Top/Arts/Animation/Training which wont match anything because of the lower case in the indexes.

Lets talk about second approach. Since you used StringField, the fields are not analyzed and will be stored as it is. i.e your index contains the following in the StringField case

Top/Arts/Animation/Training

So now when you search using TermQuery this will match because it is stored as it is.

To get results in the first approach

Use a QueryParser to construct the query instead of using TermQuery using the same analyser used at indexing time.

Read my another answer here for difference between TERMQUERY and QUERYPARSER

what is the difference between TermQuery and QueryParser in Lucene 6.0?