Nikhil Raghavendra Nikhil Raghavendra - 6 months ago 50
Python Question

Tokenize a paragraph into sentence and then into words in NLTK

I am trying to input an entire paragraph into my word processor to be split into sentences first and then into words.

I tried the following code but it does not work,

#text is the paragraph input
sent_text = sent_tokenize(text)
tokenized_text = word_tokenize(sent_text.split)
tagged = nltk.pos_tag(tokenized_text)
print(tagged)


however this is not working and gives me errors. So how do I tokenize paragraphs into sentences and then words?

Answer

You probably intended to loop over sent_text:

import nltk

sent_text = nltk.sent_tokenize(text) # this gives us a list of sentences
# now loop over each sentence and tokenize it separately
for sentence in sent_text:
    tokenized_text = nltk.word_tokenize(sentence)
    tagged = nltk.pos_tag(tokenized_text)
    print(tagged)