Marcus Renno Marcus Renno - 1 month ago 14
Python Question

Pandas - df.loc assigning NaN when used to replace columns based on a different df

I'm trying to assign values of some columns based on another column mapping them by one single key. The problem is that I don't think the mapping is being used correctly, because it is assigning NaN to the columns.

I should be mapping them by 'SampleID'.

Here is the DF I want to assign values to

>>> df.ix[new_df['SampleID'].isin(pooled['SampleID']), cols]
Volume_Received Quantity massug
88280 2.0 15.0 1.0
88282 3.0 55.0 5.0
88284 2.5 46.2 3.0
88286 2.0 98.0 5.0
229365 2.0 8.4 3.0
229366 3.0 15.9 3.0
229367 1.5 7.7 2.0
233666 1.5 50.8 3.0
233667 4.0 60.2 5.0


This is the new value I have for them

>>> numerical
Volume_Received Quantity massug
SampleID
sample8 10.0 75.0 5.0
sample70 15.0 275.0 25.0
sample72 12.5 231.0 15.0
sample89 6.0 294.0 15.0
sample90 4.0 16.8 6.0
sample96 6.0 31.8 6.0
sample97 3.0 15.4 4.0
sample99 3.0 101.6 6.0
sample100 8.0 120.4 10.0


I'm using this command to assign the values:

df.ix[df['SampleID'].isin(pooled['SampleID']), cols] = numerical[cols]


Where pooled is basically
pooled = df[df['type'] == 'Pooled']
and
cols
is a list with the three columns shown above. After I run the code above I receive NaN in all the values. I think I'm telling pandas to get values where it does not exist because of the mapping and it's returning something null which is being converted to NaN (assumption).

Answer

index does not match,

you can use

df.ix[df['SampleID'].isin(pooled['SampleID']), cols] = numerical[cols].values

only if the size are exactly the same!

Comments