manu sharma manu sharma - 4 months ago 58
Python Question

Python: parse all files in a folder

I am trying to parse allthe files in a folder with help of a python loop and then store it as a dataframe, I am using following script

path='C:\\Users\\manusharma\\Training'

for filename in os.listdir(path):
tree = ET.parse(filename)
a = ET.tostring(tree.getroot(), encoding='utf-8', method='text')
c = a.replace('\n', '')
df = df.append({'text': c, 'type': 'abc'}, ignore_index=True)


and my path file has following files

abc1.xml
abc2.xml
abc3.xml
abc4.xml
abc5.xml


every time, I ran my code it show me an error

IOError: [Errno 2] No such file or directory: 'abc1'


though it is there, where am I making an error? Appreciate every help

Answer

os.listdir() returns only filenames (not full paths).

You can try to use glob.glob(path + '/*.xml') instead of os.listdir(path)

Demo:

In [111]: path = 'd:/temp/xml'

In [112]: os.listdir(path)
Out[112]: ['1.xml', '2.xml', '3.xml', 'bla.tmp']

In [113]: glob.glob(path + '/*.xml')
Out[113]: ['d:/temp/xml\\1.xml', 'd:/temp/xml\\2.xml', 'd:/temp/xml\\3.xml']