Kevin Kevin - 7 months ago 11
Python Question

Python - Pandas dataframe - set flag at intersecting indicies to 1 and the rest to 0

My problem: I want to subset a pandas dataframe and set a flag column of this index to 1 and have the rest set to 0

Here is what I have:

sorted_pVals = pd.DataFrame(pVals.items(),columns=['Name', 'P-Val'])
sorted_pVals = sorted_pVals.sort_values("P-Val")
sorted_pVals = sorted_pVals.reset_index(drop = True)
sorted_pVals['Flag'] = 0

listOfGenesInBoth = list(set(GeneSet2).intersection(sorted_pVals['Name'].tolist()))
sorted_pVals[sorted_pVals.Name.isin(listOfGenesInBoth)]
Out[442]:
Name P-Val Flag
24 L49229 0.000006 0
131 L49219 0.000157 0
474 M19045 0.003021 0
561 X140081 0.004169 0


When I do:

sorted_pVals[sorted_pVals.Name.isin(listOfGenesInBoth)]['Flag'] = 1


The values still remain 0, How can I set them to 1 at the indicies of
listOfGenesInBoth


In R I would do something like:

df[GenesVec %in% df$genes] <- 1

Answer

I believe you want to use loc, whereby you are locating all rows where the name of sorted_pVals is in listOfGenesInBoth, and then set the Flag column to 1.

sorted_pVals.loc[sorted_pVals.Name.isin(listOfGenesInBoth), 'Flag'] = 1