F.Kncja F.Kncja - 10 days ago 7
Python Question

How to clean Python file (numbers, patterns)?

I have a long text formatted this way:

2689 3015 worth n
1095 9183 worth prep
4659 1314 worthwhile a
4503 1394 worthy a
36 272345 would modal
3404 2077 wound n
4789 1263 wound v
3174 2319 wrap v
4257 1508 wrist n
223 41497 write v
1329 7309 writer n
1939 4727 writing n
2483 3390 written a
723 14274 wrong a
5771 930 wrong adv
5544 995 wrong n
5774 929 x-ray n
4424 1426 yacht n
1510 6360 yard n
5354 1056 yarn n


My question is: how do I delete all the numbers from such a file and leave only the words that are n, v, a and adv?

When I succeded in removing numbers, adding the lines from a file to a string, I got confusted how to use regex in that case and leave only the words I desire. The result should be:

worth
worthwhile
..


so, without the word-type-ending.

Should I try to obtain this by pasting those words into a .txt file?
How would you do that?

Answer

Do you even need regexes here? If the words can't contain spaces, you can just split on whitespace, and keep only the third and fourth results, and print only the third, e.g.

with open('inputfile.txt') as inf, open('wordsonly.txt', 'w') as outf:
    for line in inf:
        line = line.rstrip()
        if line:
            word, wordtype = line.split()[2:4]
            if wordtype in ('a', 'n', 'v', 'adv'):
                print(word, file=outf)
Comments