RustyShackleford RustyShackleford - 1 year ago 48
Python Question

How to open folder and place text files in dataframe and rename dataframe based on file name?

I am trying to open a folder which has multiple text files and put each file in its own dataframe and name each dataframe by the filename.

My code so far is recognizing the 5 files in the folder but is not putting the data in the files into dataframes based on their file names. could someone show me how to do this?

import os
import pandas as pd
import pypyodbc

loc = 'D:/filepath to folder with files'
filelist = os.listdir()
#print (len((pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist],axis=1))))

data = []
path = loc
files = [f for f in os.listdir(path) if os.path.isfile(f)]
for f in files:
with open(f,'r') as myfile:

df = pd.DataFrame(data)
print (df.shape)

thank you in advance

How the data in the files looks:

0010010000013 1 CITY OF HOUSTON 1.000
0010020000001 1 CURRENT OWNER 1.000
0010020000003 1 MILBY CHARLES FAMILY PTNSH 1.000
0010020000004 1 FEAGIN MICHAEL RYAN TRUST 1.000
0010020000013 1 BUFFALO BAYOU PARTNERSHIP 1.000
0010020000015 1 BUFFALO BAYOU PARTNERSHIP 1.000
0010020000023 1 CITY OF HOUSTON 1.000
0010020000024 1 LUISA MILBY FEAGIN 2007 TRUST 1.000
0010030000001 1 BUFFALO BAYOU PARTNERSHIP 1.000

Final answer

dfs = {os.path.basename(f): pd.read_csv(f, sep='\t', header=None,encoding='cp037',error_bad_lines=False) for f in glob.glob('D:/TX/Houston_County/Real_acct_owner/*.txt')}

Answer Source

Something like this should create a dict where each key (= filename) holds the dataframe with the respective file's contents.

filedfs = {}
for f in files: filedfs[f] = pd.read_csv(os.path.join(loc, f))

Or, as a one-liner as proposed by @MaxU:

dfs = {os.path.basename(f): pd.read_csv(f, delim_whitespace=True, header=None) for f in glob.glob('c:/data/*.csv')}