eliasah eliasah - 2 months ago 12
Java Question

How can I configure my index to use BM25 in ElasticSearch using the JAVA API?

I'm trying to migrate from MySQL database to ElasticSearch so I can use the full-text search technique using BM25 similarity over each fields. I'm using JAVA to fetch entries from MySQL and add them into the ElasticSearch index.

I'm building my index using the JAVA index API, but I can't figure out a way to set the BM25 similarity over my fields.

I consider a table products table from MySQL and dev as an index with products as it's index type.

The original table products contains the following fields :


  • id

  • title

  • description



You can find the code on my Github, If you'd like to take a look.
It's forked project that I've configured with Maven integration.

Any suggestion and any help is welcome, Thanks!

Answer

I found the answer for my question.

Here is the code :

Settings settings = ImmutableSettings
            .settingsBuilder()
            .put("cluster.name", "es_cluster_name"))
            // Define similarity module settings
            .put("similarity.custom.type", "BM25")
            .put("similarity.custom.k1", 2.0f)
            .put("similarity.custom.b", 1.5f)
            .build();

Client client = new TransportClient(settings);

It seems that you can define the similarity modules you wish to use in the Settings before you instantiate your Client.

Here is the list of similarity modules that are supported by elasticsearch for the moment : default, BM25, DFR, IB, LMDirichlet and LMJelinekMercer. You can specify which one you want to use in the Settings like below :

   .put("similarity.custom.type", "..." )

Each similarity has its own parameters which you would want to configure as well in order to use it properly.

Note: Code tested on elasticsearch 1.1.0.