Java Question

How can I learn *practical* natural language processing?

I have some background in Java, Pascal, PERL, SQL, & R and would like to find a reasonably least resistance path from that background to doing automated or semi-automated extraction of concepts from text and turning the result into something statistically analyzable (willing to learn new languages if needed). I imagine I will need to perform some NLP tasks on a few thousand pages of text, particularly POS processing, identification of noun phrases, word sense disambiguation. The latter, I believe, may require semi-supervised machine learning for accuracy. The question I have is where to start learning practical NLP? Taking a course or reading NLP books seem to involve getting into far more detail about how NLP tasks are conducted than I need now--I just need to know what it does, how accurate it is, and what alternatives there are. Jumping into some existing NLP framework seems to get me stuck. I've used GATE for POS processing, but the output was either in XML, which I have no idea how to further process, or in postgresql, which was a bear to manipulate w/ SQL to generate statistical data. Also, at the time, GATE had no good method for extracting word sense.

Answer

NLTK is the way to go for you. :)

Also, if you are interested in implementing algorithms like LDA, LSA, etc. I would recommend to go with gensims

