cᴏʟᴅsᴘᴇᴇᴅ cᴏʟᴅsᴘᴇᴇᴅ - 2 months ago 22
Python Question

Merging columns of a dataframe based on a predicate in pandas

I've a dataframe that I obtained by opening a csv file and cleaning it up. The csv looks like this:

"1. Do some research on al-Khorezmi (also al-Khwarizmi), the man from whose name the word ""algorithm"" is derived. In particular, you should learn what the origins of the words ""algorithm"" and ""algebra"" have in common. ",,Understand,Procedural,Understand,Factual,Understand,
3. Write down driving directions for going from your school to your home with the precision required from an algorithm's description. ,,Apply,,Apply,Factual,Remember,
Write down a recipe for cooking your favorite dish with the precision required by an algorithm.,Factual,Apply,,,Factual,Remember,
"Design an algorithm to find all the common elements in two sorted lists of numbers. For example, for the lists 2, 5, 5, 5 and 2, 2, 3, 5, 5, 7, the output should be 2, 5, 5. What is the maximum number of comparisons your algorithm makes if the lengths of the two given lists are m and n, respectively?",Procedural,Create,,Apply,,,
"a. Find gcd(31415, 14142) by applying Euclid's algorithm.",Procedural,Apply,,Apply,,,


Loading it with this code:

df = pd.read_csv('ADA.csv', names=['Questions', 'A1', 'A2', 'B1', 'B2', 'C1','C2'])


This is what I have so far:

Questions A1 \
5 1. Do some research on al-Khorezmi (also al-Kh... NaN
6 3. Write down driving directions for going fr... NaN
7 Write down a recipe for cooking your favorite ... Factual
8 Design an algorithm to find all the common ele... Procedural
9 a. Find gcd(31415, 14142) by applying Euclid'... Procedural

A2 B1 B2 C1 \
5 Understand Procedural Understand Factual
6 Apply NaN Apply Factual
7 Apply NaN NaN Factual
8 Create NaN Apply NaN
9 Apply NaN Apply NaN

C2
5 Understand
6 Remember
7 Remember
8 NaN
9 NaN


The columns are
['Questions', 'A1', 'A2', 'B1', 'B2', 'C1', 'C2']
.

Now, what I need to do is to combine the columns
['A1', 'B1', 'C1']
together into one column
Label 1
and the columns
['A2', 'B2', 'C2']
together into another column
Label 2
based on this predicate:

Label 1 = A1 if A1 has a value else B1 if B1 has a value else C1


This will also be the same for Label 2:

Label 2 = A2 if A2 has a value else B2 if B2 has a value else C2


For the given input, I would want two columns that look like this:

Label 1 Label 2
Procedural Understand
Factual Apply
Factual Apply
Procedural Create
Procedural Apply


This is what I tried:

df['Label 1'] = df['A1'] if df['A1'] else df['B1'] if df['B1'] else df['C1']


But it throws this error:


ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().


If I can just have someone push me in the right direction, that'd be enough. Thanks.

Answer Source

This should work:

df['Label 1'] = df['A1'] if df['A1'].notnull().all() \
                else df['B1'] if df['B1'].notnull().all() \
                else df['C1']