Bharath Shetty Bharath Shetty -4 years ago 55
Python Question

Finding a similar text present in string in python

I have a txt file containing text

Table of Contents

Preface 1

Chapter 1: Tokenizing Text and WordNet Basics 7

Tokenizing text into sentences 8

Tokenizing sentences into words 10

Tokenizing sentences using regular expressions 12

If the string I have is :

input = "Tokenzing sentence using expressions"

I thought of using beginning and ending words to extract the sentence but there are lot of repetitions.

So whats the best way to get the output

Tokenizing sentences using regular expressions

Answer Source

If you are prepared to preprocess your chapter headings, eliminating page numbers and stuff, this:

import difflib
contents = ["Tokenizing Text and WordNet Basics",
            "Tokenizing text into sentences",
            "Tokenizing sentences into words",
            "Tokenizing sentences using regular expressions"]
input_str = "Tokenzing sentence using expressions"
print (difflib.get_close_matches(input_str, contents, n=1))

will give you this output:

['Tokenizing sentences using regular expressions']
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download