user1017373 user1017373 - 25 days ago 12
Python Question

Concatinate multiple file of different file shape throwing error :ValueError: Shape of passed values is (88, 57915), indices imply (88, 57906)

I have multiple CSV files which I need to concatenate as the single data frame, in the end, I should have a DataFrame with 88 columns. The length of each 88 file different.

And so I tried the following python script,

file_names = []
data_frames = []
path = r'/path/*'
all_files = glob.glob(os.path.join(path, "*.tsv"))
for filename in all_files:
name = os.path.basename(os.path.normpath(filename))
file_names.append(name)
df = pd.DataFrame.from_csv(filename,header=None,sep='\t')[[7]]
df.rename(columns={7: name}, inplace=True)
data_frames.append(df)

combined = pd.concat(data_frames, axis=1)


Since the data file is of different length and the above script is throwing an error as follows,

ValueError: Shape of passed values is (88, 57915), indices imply (88, 57906)


I would like to know in this case how to concatenate multiple files of different size into a single data frame. Any suggestions would be great

Answer

Ok, so after long trail and error, I found a solution thankfully by using pivot tables. The solution is,

Starting with reading the dataframes,

data_frames = []
path = r'/home/alva/projects/VBT_project/StringTie_e/results/ballgown/*'
all_files = glob.glob(os.path.join(path, "*.tsv")) 

Adding the file name asa extra column to the dataframe and make it as melted dataframe with three columns ['value','ID','Sample_name'],

for filename in all_files:
    name = os.path.basename(os.path.normpath(filename))
    df = pd.DataFrame.from_csv(filename,sep='\t')[['value']]
    #df.rename(columns={'FPKM': name}, inplace=True)
    df['Sample_name'] = name.replace('.bam.tsv','')
    data_frames.append(df.reset_index())

Then , I used pivot_table for reshaping it as I need, as following,

matrix = combined.pivot_table(index='Gene ID',columns='sample_name', values='FPKM')

here it prints the datafarme of 88 columns

Thanks for all suggestions..!!

Comments