cᴏʟᴅsᴘᴇᴇᴅ - 1 year ago 62
Python Question

# Merging columns of a dataframe based on a predicate in pandas

I've a dataframe that I obtained by opening a csv file and cleaning it up. The csv looks like this:

``````"1. Do some research on al-Khorezmi (also al-Khwarizmi), the man from whose name the word ""algorithm"" is derived. In particular, you should learn what the origins of the words ""algorithm"" and ""algebra"" have in common. ",,Understand,Procedural,Understand,Factual,Understand,
3.  Write down driving directions for going from your school to your home with the precision required from an algorithm's description. ,,Apply,,Apply,Factual,Remember,
Write down a recipe for cooking your favorite dish with the precision required by an algorithm.,Factual,Apply,,,Factual,Remember,
"Design an algorithm to find all the common elements in two sorted lists of numbers. For example, for the lists 2, 5, 5, 5 and 2, 2, 3, 5, 5, 7, the output should be 2, 5, 5. What is the maximum number of comparisons your algorithm makes if the lengths of the two given lists are m and n, respectively?",Procedural,Create,,Apply,,,
"a.  Find gcd(31415, 14142) by applying Euclid's algorithm.",Procedural,Apply,,Apply,,,
``````

``````df = pd.read_csv('ADA.csv', names=['Questions', 'A1', 'A2', 'B1', 'B2', 'C1','C2'])
``````

This is what I have so far:

``````                                          Questions          A1  \
5     1. Do some research on al-Khorezmi (also al-Kh...         NaN
6     3.  Write down driving directions for going fr...         NaN
7     Write down a recipe for cooking your favorite ...     Factual
8     Design an algorithm to find all the common ele...  Procedural
9     a.  Find gcd(31415, 14142) by applying Euclid'...  Procedural

A2          B1          B2                    C1  \
5        Understand  Procedural  Understand               Factual
6             Apply         NaN       Apply               Factual
7             Apply         NaN         NaN               Factual
8            Create         NaN       Apply                   NaN
9             Apply         NaN       Apply                   NaN

C2
5           Understand
6             Remember
7             Remember
8                  NaN
9                  NaN
``````

The columns are
`['Questions', 'A1', 'A2', 'B1', 'B2', 'C1', 'C2']`
.

Now, what I need to do is to combine the columns
`['A1', 'B1', 'C1']`
together into one column
`Label 1`
and the columns
`['A2', 'B2', 'C2']`
together into another column
`Label 2`
based on this predicate:

``````Label 1 = A1 if A1 has a value else B1 if B1 has a value else C1
``````

This will also be the same for Label 2:

``````Label 2 = A2 if A2 has a value else B2 if B2 has a value else C2
``````

For the given input, I would want two columns that look like this:

``````Label 1        Label 2
Procedural     Understand
Factual        Apply
Factual        Apply
Procedural     Create
Procedural     Apply
``````

This is what I tried:

``````df['Label 1'] = df['A1'] if df['A1'] else df['B1'] if df['B1'] else df['C1']
``````

But it throws this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().

If I can just have someone push me in the right direction, that'd be enough. Thanks.

``````df['Label 1'] = df['A1'] if df['A1'].notnull().all() \