helloB helloB - 3 months ago 13
Python Question

Pandas: take whichever column is not NaN

I am working with a fairly messy data set that has been individual csv files with slightly different names. It would be too onerous to rename columns in the csv file, partly because I am still discovering all the variations, so I am looking to determine, for a set of columns, in a given row, which field is not NaN and carrying that forward to a new column. Is there a way to do that?

Case in point. Let's say I have a data frame that looks like this:

Index A B
1 15 NaN
2 NaN 11
3 NaN 99
4 NaN NaN
5 12 14


Let's say my desired output from this is to create a new column C such that my data frame will look like the following:

Index A B C
1 15 NaN 15
2 NaN 11 11
3 NaN 99 99
4 NaN NaN NaN
5 12 14 12 (so giving priority to A over B)


How can I accomplish this?

Answer

For a dataframe with an arbitrary number of columns, you can back fill the rows (.bfill(axis=1)) and take the first column (.iloc[:, 0]):

df = pd.DataFrame({
    'A': [15, None, None, None, 12],
    'B': [None, 11, 99, None, 14],
    'C': [10, None, 10, 10, 10]})

df['D'] = df.bfill(axis=1).iloc[:, 0]

>>> df
    A   B   C   D
0  15 NaN  10  15
1 NaN  11 NaN  11
2 NaN  99  10  99
3 NaN NaN  10  10
4  12  14  10  12