hackingchemist hackingchemist - 4 months ago 39
Java Question

Lucene - Creating an Index using FSDirectory

first time posting; long time reader. I apologize a head of time if this was already asked here (I'm new to lucene as well!). I've done a lot of research and wasn't able to find a good explanation/example for my question.

First of all, I've used IKVM.NET to convert lucene 4.9 java to include in my .net application. I chose to do this so I was able to use the most recent version of lucene. No issues.

I am trying to create a basic example to start to learn lucene and to apply it to my app. I've done countless google searches and read lots of articles, apache's website, etc. My code follows mostly the example here: http://www.lucenetutorial.com/lucene-in-5-minutes.html

My question is, I don't believe I want to use RAMDirectory.. right? Since I will be indexing a database and allowing users to search it via the website. I opted for using FSDirectory because I didn't think it should be all stored in memory.

When the IndexWriter is created it is creating new files each time(.cfe, .cfs, .si, segments.gen, write.lock, etc.) It seems to me you would create these files once and then use them until the index needs to be rebuilt?

So how do I create an IndexWriter with out recreating the index files?

Code:

StandardAnalyzer analyzer;
Directory directory;
protected void Page_Load(object sender, EventArgs e)
{
var version = org.apache.lucene.util.Version.LUCENE_CURRENT;
analyzer = new StandardAnalyzer(version);

if(directory == null){ directory= FSDirectory.open(new java.io.File(HttpContext.Current.Request.PhysicalApplicationPath + "/indexes"));
}

IndexWriterConfig config = new IndexWriterConfig(version, analyzer);

//i found setting the open mode will overwrite the files but still creates new each time
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

IndexWriter w = new IndexWriter(directory, config);
addDoc(w, "test", "1234");
addDoc(w, "test1", "1234");
addDoc(w, "test2", "1234");
addDoc(w, "test3", "1234");
w.close();

}


private static void addDoc(IndexWriter w, String _keyword, String _keywordid)
{
Document doc = new Document();
doc.add(new TextField("Keyword", _keyword, Field.Store.YES));
doc.add(new StringField("KeywordID", _keywordid, Field.Store.YES));
w.addDocument(doc);
}

protected void searchButton_Click(object sender, EventArgs e)
{
String querystr = "";
String results="";


querystr = searchTextBox.Text.ToString();

Query q = new QueryParser(org.apache.lucene.util.Version.LUCENE_4_0, "Keyword", analyzer).parse(querystr);

int hitsPerPage = 100;
DirectoryReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);

TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

if (hits.Length == 0)
{
label.Text = "Nothing was found.";
}
else
{
for (int i = 0; i < hits.Length; ++i)
{
int docID = hits[i].doc;
Document d = searcher.doc(docID);

results += "<br />" + (i + 1) + ". " + d.get("KeywordID") + "\t" + d.get("Keyword") + " Hit Score: " + hits[i].score.ToString() + "<br />";

}
label.Text = results;
reader.close();
}
}

Answer

Yes, RAMDirectory is great for quick, on-the-fly tests and tutorials, but in production you will usually want to store your index on the file system through an FSDirectory.

The reason it's rewriting the index every time you open the writer is that you are setting the OpenMode to IndexWriterConfig.OpenMode.CREATE. CREATE means you want to remove any existing index at that location, and start from scratch. You probably want IndexWriterConfig.OpenMode.CREATE_OR_APPEND, which will open an existing index if one is found.


One minor note:

You shouldn't use LUCENE_CURRENT (deprecated), use a real version instead. You are also using LUCENE_4_0 in your QueryParser. Neither of these will probably cause any major problems, but good to be consistent anyway.