William Dunne William Dunne - 2 months ago 12
Java Question

Correct usage of boolean logic in lucene

I apologise for the question, but it's left me a little baffled.

To start with, I have a set of address objects, and I am trying to find relevant ones with a query that would (in pseudocode) look something like this

SELECT
WHERE
Fuzzy(addr1, "address line 1) // = true
AND
(Fuzzy(addr2, "address line 2") OR
Fuzzy(addrcity, "address city") OR
//all the other address fields
)


Essentially I want to bring back all entities where at least address line one matches roughly, and one of the other parts of the address also has a fuzzy match.

I have verified that the data is there via this query:

Query toRun = new FuzzyQuery(new Term("addr1", getLineOne()));


Which returns the document which has all of the correct fields.

My code is below:

public List<Address> search() {
List<Address> results = new ArrayList<>();

BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();
queryBuilder.setMinimumNumberShouldMatch(2);

BooleanQuery.Builder subQueryBuilder = new BooleanQuery.Builder();
subQueryBuilder.setMinimumNumberShouldMatch(1);

if(!getLineOne().equals("")) {
Query query = new FuzzyQuery(new Term("addr1", getLineOne()));
queryBuilder.add(query, BooleanClause.Occur.MUST);
}

if(!getLineTwo().equals("")) {
Query query = new FuzzyQuery(new Term("addr2", getLineTwo()));
subQueryBuilder.add(query, BooleanClause.Occur.SHOULD);
}
if(!getCity().equals("")) {
Query query = new FuzzyQuery(new Term("addrcity", getCity()));
subQueryBuilder.add(query, BooleanClause.Occur.SHOULD);
}
if(!getCounty().equals("")) {
Query query = new FuzzyQuery(new Term("addrcounty", getCounty()));
subQueryBuilder.add(query, BooleanClause.Occur.SHOULD);
}
if(!getCountry().equals("")) {
Query query = new FuzzyQuery(new Term("addrcountry", getCountry()));
subQueryBuilder.add(query, BooleanClause.Occur.SHOULD);
}
if(!getPostcode().equals("")) {
Query query = new FuzzyQuery(new Term("addrpostcode", getPostcode()));
subQueryBuilder.add(query, BooleanClause.Occur.SHOULD);
}

queryBuilder.add(subQueryBuilder.build(), BooleanClause.Occur.MUST);

try {
Query toRun = queryBuilder.build();

List<Document> searchResults = SearchEngine.getInstance(SEARCH_ENGINE)
.performSearch(toRun, 50);

searchResults.forEach(result -> {
results.add(new Address(result));
});
} catch (IOException e) {
e.printStackTrace();
}


return results;
}


This produces, when the object is supplied with a line one, line two and a country a query that looks like this in text form:

(+addr1:address line 1~2 +((addr2:address line 2~2 addrcountry:romania~2)~1))~2

Which as mentioned returns nothing.

Where am I going wrong in my logic?

Answer

You need to get rid of the first minimumShouldMatch call.

setMinimumShouldMatch specifies how many SHOULD clauses must match. Your queryBuilder has no SHOULD clauses, so it obviously can't match two of them, thus you get no results.

You could just delete both of the setMinimumShouldMatch lines, and have a query that works correctly. Alternatively, you could use the minimumShouldMatch logic and simplify to use only one BooleanQuery, like this:

public List<Address> search() {
    List<Address> results = new ArrayList<>();

    BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();
    queryBuilder.setMinimumNumberShouldMatch(1);

    if(!getLineOne().equals("")) {
        //This is a MUST clause, and so doesn't factor into the minimumShouldMatch
        Query query = new FuzzyQuery(new Term("addr1", getLineOne()));
        queryBuilder.add(query, BooleanClause.Occur.MUST);
    }

    if(!getLineTwo().equals("")) {
        Query query = new FuzzyQuery(new Term("addr2", getLineTwo()));
        queryBuilder.add(query, BooleanClause.Occur.SHOULD);
    }
    if(!getCity().equals("")) {
        Query query = new FuzzyQuery(new Term("addrcity", getCity()));
        queryBuilder.add(query, BooleanClause.Occur.SHOULD);
    }
    if(!getCounty().equals("")) {
        Query query = new FuzzyQuery(new Term("addrcounty", getCounty()));
        queryBuilder.add(query, BooleanClause.Occur.SHOULD);
    }
    if(!getCountry().equals("")) {
        Query query = new FuzzyQuery(new Term("addrcountry", getCountry()));
        queryBuilder.add(query, BooleanClause.Occur.SHOULD);
    }
    if(!getPostcode().equals("")) {
        Query query = new FuzzyQuery(new Term("addrpostcode", getPostcode()));
        queryBuilder.add(query, BooleanClause.Occur.SHOULD);
    }

    try {
        Query toRun = queryBuilder.build();

        List<Document> searchResults = SearchEngine.getInstance(SEARCH_ENGINE)
                .performSearch(toRun, 50);

        searchResults.forEach(result -> {
            results.add(new Address(result));
        });
    } catch (IOException e) {
        e.printStackTrace();
    }

    return results;
}
Comments