Kiper Kiper - 1 month ago 4
Python Question

searching a file for words from a list

I am trying to search for words in a file. those words stored in a separate list.
Then if it finds it it stores in in another list and return the list in the end.

the code looks like:

def scanEducation(file):
education = []
qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
"html", "css", "jquery", "linux", "windows"]
with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
for line in file1:
for word in line.split():
matching = [s for s in qualities if word.lower() in s]
if matching is not None:
return education

First it returns me a list with bunch of empty "seats" which means my comparison isnt working?

the result(scans 4 files):

"C:\Program Files (x86)\Python2\python.exe" C:/Users/Vadim/PycharmProjects/TestFiles/
[[], [], [], [], [], [], [], [], [], ['java', 'javascript']]
[[], [], [], [], [], [], [], [], [], ['pascal']]
[[], [], [], [], [], [], [], [], [], ['linux']]
[[], [], [], [], [], [], [], [], [], [], ['c#']]

Process finished with exit code 0

the input file contains:

Name: Some Name
Phone: 1234567890

Second issue each file containes 3 different skills, but the function finds only 1 or 2.. also bad comparison or i have a different errors here?

I would expect the result being a list of just the found skills without the empty places and to find all the skills in the file, not just part of them.

Thanks alot!!!

the function does find now all the skills when i do word.split(', ')
but if i would like it to be more universal, what could be a good way to find those skills if i dont know exactly what will separate them.?

THANKS you all, it was very helpful!!


You get empty lists because None is not equal to an empty list. What you might want is to change the condition to the following:

if matching:
    # do your stuff

It seems that you're checking if a substring is present in the strings in the qualities list. Which might not be what you want. If you want to check the words on a line that appear on the qualities list, you might want to change your list comprehension to:

words = line.split()
match = [word for word in words if word.lower() in qualities]

If you're looking into matching both , and spaces, you might want to look into regex. See Python - Split Strings with Multiple Delimiters.