Yoe Pass Yoe Pass - 7 months ago 26
Python Question

Stemming of words using python

I have 2 files; root_words.txt and affix_words.txt. What's I would like to do is matching the root words in affix_words.txt and replace the root word in affix_words.txt to "I" character and before root words replace with "E" character and after root words replace with "B".

For example:

read
xxxx


root_words.txt

reading
aaaxxxxyyy


affix_words.txt

output that I want is:

r e a d i n g<TAB>I I I I B I I
a a a x x x x y y y<TAB>I I E I I I I B I I


I try to match the root_words.txt with affix_words.txt by using Linux command:

fgrep -f "root_words.txt" "affix_words.txt"


but how to replace root words with "I" character

Answer

You could use this simple approach:

with open "root_words.txt" as rfile, "affix_words.txt" as afile:
    try:
        rw_start = aword.index(rword)
        rw_end = rw_start + len(rword)
        result = " ".join( "E" if n==rw_start-1 else \
                           "B" if n==rw_end else \
                           "I" for (n, letter) in enumerate(aword) )
    except:
        result = "NOT FOUND!"
    print("root: '{}', affixed: '{}', stemmed: '{}'".format(rword, aword, result))

Example:

root_words.txt:

read
vote
like

affix_words.txt:

reading
upvote
unlikely

Output:

root: 'read', affixed: 'reading', stemmed: 'I I I I B I I'
root: 'vote', affixed: 'upvote', stemmed: 'I E I I I I'
root: 'like', affixed: 'unlikely', stemmed: 'I E I I I I B I'

See this code running on ideone.com

Comments