Gravel Gravel - 1 year ago 163
Python Question

python dataframe converts integer to float

With my code, I combine multiple files to a dataframe and convert the NaN values to zero. In the code, I combine two columns (genome and contig) to a new column (source), but my dataframe converts somewhere the column contig from a integer to a float. My inputfile looks like this

AAA 1 345
AAB 2 344


The output is now like:

AAA_1.0 345
AAB_2.0 344


And I want to have it like

AAA_1 345
AAB_2 344


Since my code is very long, I can not place the whole code and all example files on this site, but the part of my code where this probably happend is as follows. I hope that this will be enough for someone to see what the problem is.

#import contig length
df5bb = pd.read_csv('count_contiglength.out', header=None, delim_whitespace=True, names = ["genome", "contig", "contig_length"])
df5bb['source'] = df5bb.genome.astype(str).str.cat(df5bb.contig.astype(str), sep='_')
df5bb = df5bb.set_index('source')
df5b = pd.merge(df5a, df5bb, how='outer')
df5b['source'] = df5b.genome.astype(str).str.cat(df5b.contig.astype(str), sep='_')

nan_cols = df5b.columns[df5b.isnull().any(axis=0)]
for col in nan_cols:
df5b[col] = df5b[col].fillna(0).astype(int)

#import contigIDnumbers
df5cc = pd.read_csv('contigID.out', header=None, delim_whitespace=True, names = ["genome", "contig", "contigID"])
df5cc['source'] = df5cc.genome.astype(str).str.cat(df5cc.contig.astype(str), sep='_')
df5cc = df5cc.set_index('source')
df5c = pd.merge(df5b, df5cc, how='right')
df5c['source'] = df5c.genome.astype(str).str.cat(df5c.contig.astype(str), sep='_')

Answer Source

I think after merge you get at least one NaN in column genome.

So need again:

df5b['genome'] = df5b['genome'].fillna(0).astype(int)

Check na type promotions - int are converted to float.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download