Claudio Davi Claudio Davi - 5 months ago 26
Python Question

Convert to True/False values of Pandas Dataframe

I have a rather big dataframe that looks a bit like this:

| obj1 | obj2 | obj3 |
|------------------------
0 | attr1 | attr2 | attr1 |
1 | attr2 | attr3 | NaN |
2 | attr3 | attrN | NaN |


I'm new(ish) to pandas but I can't figure out a way to make it look like this:

| obj1 | obj2 | obj3 |
------------------------
attr1 | True | False | True |
attr2 | True | False | False |
attr3 | True | False | False |


what's the most pythonic/fast way to go around this?

EDIT

I don't have any column in the dataframe with all the attributes.
I could have an Obj4 that has an attribute which is not seen anywhere else

Answer Source

You need set_index + eq:

df = df.set_index('obj1', drop=False).rename_axis(None)
df = df.eq(df['obj1'], axis=0)
print (df)
       obj1   obj2   obj3
attr1  True  False   True
attr2  True  False  False
attr3  True  False  False

Similar solution:

df = df.set_index('obj1', drop=False).rename_axis(None)
df = df.eq(df.index.values, axis=0)
print (df)
       obj1   obj2   obj3
attr1  True  False   True
attr2  True  False  False
attr3  True  False  False

And numpy solution:

df = pd.DataFrame(df.values == df['obj1'].values[:, None], 
                  index=df['obj1'].values, 
                  columns=df.columns)
print (df)
       obj1   obj2   obj3
attr1  True  False   True
attr2  True  False  False
attr3  True  False  False

EDIT:

For compare all values it is not easy:

vals = df.stack().unique()
L = [pd.Series(df[x].unique(), index=df[x].unique()).reindex(index=vals) for x in df.columns]
df1 = pd.concat(L, axis=1, keys=df.columns)
print (df1)
        obj1   obj2   obj3
attr1  attr1    NaN  attr1
attr2  attr2  attr2    NaN
attr3  attr3  attr3    NaN
attrN    NaN  attrN    NaN

df1 = df1.eq(df1.index.values, axis=0)
print (df1)
        obj1   obj2   obj3
attr1   True  False   True
attr2   True   True  False
attr3   True   True  False
attrN  False   True  False