Sevyns Sevyns - 5 months ago 39
Python Question

Coalesce values from 2 columns into a single column in a pandas dataframe

I'm looking for a method that behaves similarly to coalesce in T-SQL. I have 2 columns (column A and B) that are sparsely populated in a pandas dataframe. I'd like to create a new column using the following rules:


  1. If the value in column A is not null, use that value for the new column C

  2. If the value in column A is null, use the value in column B for the new column C



Like I mentioned, this can be accomplished in MS SQL Server via the coalesce function. I haven't found a good pythonic method for this; does one exist?

Answer

use combine_first():

In [16]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))

In [17]: df.ix[::2, 'a'] = np.nan

In [18]: df
Out[18]:
     a  b
0  NaN  0
1  5.0  5
2  NaN  8
3  2.0  8
4  NaN  3
5  9.0  4
6  NaN  7
7  2.0  0
8  NaN  6
9  2.0  5

In [19]: df['c'] = df.a.combine_first(df.b)

In [20]: df
Out[20]:
     a  b    c
0  NaN  0  0.0
1  5.0  5  5.0
2  NaN  8  8.0
3  2.0  8  2.0
4  NaN  3  3.0
5  9.0  4  9.0
6  NaN  7  7.0
7  2.0  0  2.0
8  NaN  6  6.0
9  2.0  5  2.0