Sujit Sujit - 2 months ago 27
Scala Question

Can't search in Lucene 6.2 using Scala

I'm trying to index the data from MySQL(using Slick in Scala) using Lucene 6.2. Here is the code below

package oc.api.services

/**
* Created by sujit on 9/7/16.
*/
import org.apache.lucene.document._
import org.apache.lucene.analysis.standard.StandardAnalyzer
import org.apache.lucene.index._
import org.apache.lucene.search.IndexSearcher
import java.io.{File, IOException}
import java.nio.file.Paths

import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.ActorMaterializer
import oc.api.utils.{Config, DatabaseService}
import org.apache.lucene.analysis.core.KeywordAnalyzer
import org.apache.lucene.index.IndexWriterConfig.OpenMode
import org.apache.lucene.queryparser.classic.{MultiFieldQueryParser, QueryParser}
import org.apache.lucene.store.FSDirectory

import scala.concurrent.ExecutionContext

class Indexer extends Config {
implicit val actorSystem = ActorSystem()
implicit val executor: ExecutionContext = actorSystem.dispatcher
implicit val log: LoggingAdapter = Logging(actorSystem, getClass)
implicit val materializer: ActorMaterializer = ActorMaterializer()

val databaseService = new DatabaseService(jdbcUrl, dbUser, dbPassword)

val notesService = new NotesService(databaseService)

def setIndex = {
val IndexStoreDir = Paths.get("/var/www/html/LuceneIndex")
val analyzer = new StandardAnalyzer()
val writerConfig = new IndexWriterConfig(analyzer)
writerConfig.setOpenMode(OpenMode.CREATE)
writerConfig.setRAMBufferSizeMB(500)
val directory = FSDirectory.open(IndexStoreDir)
var writer = new IndexWriter(directory, writerConfig)
val notes = notesService.getNotes() //Gets all notes from slick. Data is coming in getNotes()
var doc = new Document()
var count = 0

val stringType = new FieldType()
notes.map(_.foreach{
case(note) =>
doc = new Document()

var field = new TextField("id", note.title, Field.Store.YES)
doc.add(field)

field = new TextField("title", note.title, Field.Store.YES)
doc.add(field)

field = new TextField("teaser", note.teaser, Field.Store.YES)
doc.add(field)

field = new TextField("description", note.description, Field.Store.YES)
doc.add(field)

writer.addDocument(doc)
writer.commit()
})
//
}

def search(keyword: String) = {
val IndexStoreDir = Paths.get("/var/www/html/LuceneIndex")
var directoryReader = DirectoryReader.open(FSDirectory.open(IndexStoreDir))
val analyzer = new StandardAnalyzer()

val searcher = new IndexSearcher(directoryReader)
val fieldsToSearch = Array("title", "teaser", "description")

val mqp = new MultiFieldQueryParser(fieldsToSearch,analyzer) //QueryParser("title", analyzer) //MultiFieldQueryParser(filesToSearch,analyzer)
val query = mqp.parse(keyword)

val hits = searcher.search(query,500)
val scoreDoc = hits.scoreDocs
scoreDoc.foreach( docs => {
val doc = searcher.doc(docs.doc)
println("***** Document Found: ")
println("***** Title: ")
println(doc.get("title"))
println("***** Teaser: ")
println(doc.get("teaser"))
println("***** Description: ")
println(doc.get("description"))
})
println("****** Results Found: " + hits.totalHits)
}

}

object Indexer extends App {
val index = new Indexer
//index.setIndex
index.search("Donec")
}


The setIndex function is working as expected in provided Path. But While I search the index based on keyword, It throws 0 result. Is there any mistake in seach function? How can this be resolved?

Answer

The main reason here is probably a mismatch of your analyzers. You use the KeywordAnalyzer for indexing, which does not analyze at all. For search, you use the StandardAnalyzer. In your example, the query "Donec" will be parsed and analyzed to title:donec as if you;d used new TermQuery(new Term("title", "donec")). This will only match documents with the exact title donec, since you've used the keyword analyzer at index time. You should try to use the same analyzer for indexing as well.

Another thing might be – and I can only only guess this – notesService.getNotes() might be a Future[_] (or a similar asynchronous type), given that it involves slick. If it is, you add all the documents in the call to .map(), scheduled to happen once the future resolved. The writer.commit() call however, happens in the calling thread, likely before you've added all documents, so you should move the commit into the map callback as well.

Comments