erupti0n erupti0n - 1 month ago 9
Python Question

Gathering words with a set lenght of letters

Good afternoon lads,

yesterday morning i was thinking: "is there a way to group words with a given lenght of letters in python?"

so i started working on this function:

lenght_words(a,b,text):
returnlist = []


in the returnlist i want the words which have lenght:

a <= lenght <= b

so i was thinking:


  1. split the lines of the text in order to have the function operating on differents lines of the text

  2. remove the punctuation from the lines

  3. if in a line there are words which have the right lenght, the function must put them in the returnlist with a space between each word (e.g. 'cat dog'), otherwise the function put ''



i know there is the splitlines() method, but i don't know how to use it (even after readng about it)

i want to give an example of how the function has to work:

function(6,7,'All in the golden afternoon\nFull leisurely we glide;\nFor both our oars, with little skill,\nBy little arms are plied.')


this function should separate the lines:

All in the golden afternoon

Full leisurely we glide;

For both our oars,

with little skill,

By little arms are plied.

--> delete the punctuation and return:

['golden','','little','little']


i know i have to append the words to the return list, but i don't know how to proceed

thank you guys for your time

cheers and have a good day :))

Answer

You could write a list comprehension like this:

[token for token in s.split(" ") if a <= len(token) <= b]

It would return all words in variable s (str) with character lengths between a (int) and b (int). An example on how to use it is

s = 'All in the golden afternoon\nFull leisurely we glide;'
s += '\nFor  both our oars, with little skill,\nBy little arms are plied.'
a = 6
b = 7
result = [token for token in s.split(" ") if a <= len(token) <= b]

where result would be:

['golden', 'little', 'little', 'plied.']

To get rid of the punctuation, just add

import string
s = "".join([char for char in s if char not in string.punctuation])

above the last line. The results are then:

['golden', 'little', 'little']

Hope this works for you!

EDIT:

If you want to search the different lines separately, I would suggest a solution like this:

import string


def split_by_line_and_find_words_with_length(min, max, s):
    #store result
    result = []

    # separate string lines
    lines = s.splitlines()

    for line in lines:
        # remove punctuation
        l = "".join([char for char in line if char not in string.punctuation])

        # find words with length between a and b
        find = [token for token in l.split(" ") if a <= len(token) <= b]

        # add empty string to result if no match
        if find == []: find.append("")

        # add any findings to result
        result += find

    return result

With your example string and preferred word lengths this would return ['golden', '', 'little', 'little'].

Comments