Santiago Gil Santiago Gil - 28 days ago 15
Java Question

How to override Similarity in a single field in Lucene?

I am using version 4.4 of Apache Lucene.

My system indexes a collection of documents into three different fields: the title, description and author(s) of the documents.

I want a document to get higher score the more frequency of a query term it has. However, when the term is part of the author field, I just want it to act as a "boolean"; this is, to add the same score if the term appears just once or more times. For example, if three authors of a document have a surname "Smith", just one match should be given.

For this, I have found the following code, which overrides the term frequency:

Similarity sim = new DefaultSimilarity() {
@Override
public float tf(float freq) {
return freq == 0 ? 0 : 1;
}
};
searcher.setSimilarity(sim);


However, this overrides me it for the three fields. How can I manage to override the single author field?

Answer

You can implement PerFieldSimilarityWrapper, like this:

public class MyCustomSimilarity extends PerFieldSimilarityWrapper {
    @Override
    public Similarity get(String fieldName) {
        if (fieldName.equals("author")) {
            return new CustomAuthorSimilarity();
        }
        else {
            return new DefaultSimilarity();
        }
    }
}
Comments