helloB - 2 months ago 4x

Python Question

I am working with a fairly messy data set that has been individual csv files with slightly different names. It would be too onerous to rename columns in the csv file, partly because I am still discovering all the variations, so I am looking to determine, for a set of columns, in a given row, which field is not NaN and carrying that forward to a new column. Is there a way to do that?

Case in point. Let's say I have a data frame that looks like this:

`Index A B`

1 15 NaN

2 NaN 11

3 NaN 99

4 NaN NaN

5 12 14

Let's say my desired output from this is to create a new column C such that my data frame will look like the following:

`Index A B C`

1 15 NaN 15

2 NaN 11 11

3 NaN 99 99

4 NaN NaN NaN

5 12 14 12 (so giving priority to A over B)

How can I accomplish this?

Answer

For a dataframe with an arbitrary number of columns, you can back fill the rows (`.bfill(axis=1)`

) and take the first column (`.iloc[:, 0]`

):

```
df = pd.DataFrame({
'A': [15, None, None, None, 12],
'B': [None, 11, 99, None, 14],
'C': [10, None, 10, 10, 10]})
df['D'] = df.bfill(axis=1).iloc[:, 0]
>>> df
A B C D
0 15 NaN 10 15
1 NaN 11 NaN 11
2 NaN 99 10 99
3 NaN NaN 10 10
4 12 14 10 12
```

Source (Stackoverflow)

Comments