RustyShackleford RustyShackleford - 4 months ago 20
Python Question

How to open folder and place text files in dataframe and rename dataframe based on file name?

I am trying to open a folder which has multiple text files and put each file in its own dataframe and name each dataframe by the filename.

My code so far is recognizing the 5 files in the folder but is not putting the data in the files into dataframes based on their file names. could someone show me how to do this?

code:
import os
import pandas as pd
import pypyodbc

loc = 'D:/filepath to folder with files'
os.chdir(loc)
filelist = os.listdir()
#print (len((pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist],axis=1))))

data = []
path = loc
files = [f for f in os.listdir(path) if os.path.isfile(f)]
for f in files:
with open(f,'r') as myfile:
data.append(myfile.read())

df = pd.DataFrame(data)
print (df.shape)


thank you in advance

-edit-
How the data in the files looks:

0010010000013 1 CITY OF HOUSTON 1.000
0010020000001 1 CURRENT OWNER 1.000
0010020000003 1 MILBY CHARLES FAMILY PTNSH 1.000
0010020000004 1 FEAGIN MICHAEL RYAN TRUST 1.000
0010020000013 1 BUFFALO BAYOU PARTNERSHIP 1.000
0010020000015 1 BUFFALO BAYOU PARTNERSHIP 1.000
0010020000016 1 USRP PAC LP SPAGHETTI WAREHOUSE 1.000
0010020000023 1 CITY OF HOUSTON 1.000
0010020000024 1 LUISA MILBY FEAGIN 2007 TRUST 1.000
0010030000001 1 BUFFALO BAYOU PARTNERSHIP 1.000


-edit-
Final answer

dfs = {os.path.basename(f): pd.read_csv(f, sep='\t', header=None,encoding='cp037',error_bad_lines=False) for f in glob.glob('D:/TX/Houston_County/Real_acct_owner/*.txt')}

Answer

Something like this should create a dict where each key (= filename) holds the dataframe with the respective file's contents.

filedfs = {}
for f in files: filedfs[f] = pd.read_csv(os.path.join(loc, f))

Or, as a one-liner as proposed by @MaxU:

dfs = {os.path.basename(f): pd.read_csv(f, delim_whitespace=True, header=None) for f in glob.glob('c:/data/*.csv')}