MissComputing MissComputing - 4 months ago 16
Python Question

strip white spaces and new lines when reading from file

I have the following code, that successfully strips end of line characters when reading from file, but doesn't do so for any leading and trailing white spaces (I want the spaces in between to be left!)

What is the best way to achieve this? (Note, this is a specific example, so not a duplicate of general methods to strip strings)

My code: (try it with the test data: "Mr Moose" (not found) and if you try "Mr Moose " (that is a space after the Moose) it will work.

#A COMMON ERROR is leaving in blank spaces and then finding you cannot work with the data in the way you want!

"""Try the following program with the input: Mr Moose
...it doesn't work..........
but if you try "Mr Moose " (that is a space after Moose..."), it will work!
So how to remove both new lines AND leading and trailing spaces when reading from a file into a list. Note, the middle spaces between words must remain?
"""

alldata=[]
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
for line in f.readlines():
alldata.append((line.strip()))
print(alldata)


print()
print()

for x in alldata:
teacher_names.append(x.split(delimiter)[col_num])

teacher=input("Enter teacher you are looking for:")
if teacher in teacher_names:
print("found")
else:
print("No")


Desired output, on producing the list alldata

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']


i.e - remove all leading and trailing white space at the start, and before or after the delimiter. The spaces in between words such as Mr Moose, must be left.

Contents of teacherbook:

Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English


Thanks in advance

Answer Source

You could use a regex:

txt='''\
Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English'''

>>> [re.sub(r'\s*:\s*', ':', line).strip() for line in txt.splitlines()]
['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

So your code becomes:

import re
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
    alldata=[re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip() for line in f]
    print(alldata)

    for x in alldata: 
         teacher_names.append(x.split(delimiter)[col_num]) 
    print(teacher_names)  

Prints:

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']
['Mr Moose', 'Mr Goose', 'Mrs Congenelipilling']

The key part is the regex:

re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip()

          ^                          0 to unlimited spaced before the delimiter
            ^                        place for the delimiter
              ^                      unlimited trailing space

Interactive Demo


For an all Python solution, I would use str.partition to get the left hand and right hand side of the delimiter then strip the whitespace as needed:

alldata=[]    
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        alldata.append(lh.rstrip() + sep + rh.lstrip())

Same output