Matt Matt - 3 months ago 9
Python Question

a python library that accepts some text, and replaces phone numbers, names, and so on with tokens

I need a python library that accepts some text, and replaces phone numbers, names, and so on with tokens. Example:

Input: Please call Robert on 0430013454 to discuss this further.

Output: Please call NAME on PHONE to discuss this further.

In other words I need to take a sentence, any sentence, then the program will be run on that sentence and remove anything that looks like a name, phone number or any other identifier, and replace it with a token I.E NAME, PHONE NUMBER So that token would just be text to replace the info so that it is no longer displayed.

Must be python 2.7 compatible. Would anybody know how this would be done?

Cheers!

Answer

Not really sure about name recognition. However, if you know the names that you would be looking for it would be easy. You could have a list of all of the names that you're looking for and check to see if each one is in the string and if so just use string.replace. If the names are random you could maybe look into NLTK I think they might have some name entity recognition. I really don't know anything about it though...

But as for phone numbers, that's easy. You can split the string into a list and check to see if any element consists of numbers. You could even check the length to make sure it's 10 digits (i'm assuming all numbers will be 10 based on your example).

Something like this...

example_input = 'Please call Robert on 0430013454 to discuss this further.'

new_list = example_input.split(' ')

for word in new_list:
    if word.isdigit():
        pos = new_list.index(word)
        new_list[pos] = 'PHONE'

example_output = ' '.join(new_list)

print example_output

This would be the output: 'Please call Robert on PHONE to discuss this further'

The if statement would be something like if word.isdigit() and len(word) == 10: if you wanted to make sure the length of the digits is 10.

Comments