Marcus Renno Marcus Renno - 12 days ago 6
Python Question

Add column in DF if value modified of one column exists in DF

I'm trying to add one column in my dataframe (DF) according to another column value and whether that value is in my DF or not.

Example:

>>> d = { 'one' : pd.Series(['aa', 'bb', 'cc', 'aa-01', 'bb-02', 'dd']) }
>>> df = pd.DataFrame(d)
>>> df
one
0 aa
1 bb
2 cc
3 aa-01
4 bb-02
5 dd


I would like to add the following column if I can find another element with the current element appended -01 or -02.

Example: in this dataframe only the elements 'aa' and 'bb' have the elements with the appended value, which are 'aa-01', and 'bb-02', thus only 'aa' and 'bb' will have the value
True
in the new column

Expected result:

>>> expected_df
one two
0 aa True
1 bb True
2 cc False
3 aa-01 False
4 bb-02 False
5 dd False


I believe I have to use
isin()
with
apply()
, but I can't figure out a way to modify the row and use
isin
at the same time within the function passed as argument to
apply
.

Answer

You can create a boolean mask containing the conditions to keep. Followed by using isin after splitting on the char "-" from the elements selected after generating the mask and taking it's first part converted to a list.

mask = df['one'].str.contains('-01|-02')   # Can use df['one'].str.endswith(('-01','-02'))
df['two'] = df['one'].isin(df[mask].stack().str.split('-').str[0].tolist())
df

enter image description here


More robust approach:

mask = df['one'].str.endswith(('-01','-02'))
df['two'] = df['one'].isin(df[mask].squeeze().str[:-3])
print (df['two'])
0     True
1     True
2    False
3    False
4    False
5    False
Name: two, dtype: bool
Comments