Carlos Arronte Bello Carlos Arronte Bello - 2 months ago 14
Python Question

remove specific rows in dataframe with pandas

i need some help from all of you
I'm working with a data form from excel, so basically now i have something like this.

csr id ac otc tm lease maint
1 456 b 0 0 0 0
1 543 a 0 1 1 0
1 435 e 0 0 0 0
2 123 w 1 1 1 1
2 123 g 0 0 0 0
3 987 j 0 0 0 0
4 258 k 1 1 1 1
4 258 m 0 0 0 0


So i need to delete the rows with zero in 'otc' 'tm' 'lease' 'maint' columns. i do something like this

df = pd.read_excel(xlsx,'Sheet1')
df_zero = df[(df['OTC'] == 0) & (df['TM'] == 0) & (df['Lease'] == 0) & (df['Maint'] == 0) & (df['Support'] == 0) & (df['Other'] == 0)]


In this way i open the file and save in df_zero all the rows that contain zero in the specific column. Then

df1 = df_zero.loc[:, 'CSR']


Basically this save in df1 the CSR number for the rows with zeros in the specific columns, like this

csr
1
1
2
3
4


So i think ok i do this and problem resolved.

for n1 in df1:
df = df[df.CSR != n1]


But the problem here is, as you can see in the CSR 1, we have 3 different rows, if i run that 'for', i will delete the 3 of them, i just need to remove the ones that have zeros in the specific columns ('otc' 'tm' 'lease' 'maint').

I think in a 'for' for be moving in the CSR and another one to be moving in 'otc' if the value that i found is zero move to 'tm'(in the same row) check for zero, then to 'lease' and 'maint' in the same row, if any of this columns is not zero, jump to the next CSR. In this example. We will remove the CSR 1, because all of them ('otc' 'tm' 'lease' 'maint') are zero, then jump to the next CSR, again 1, but in this case we have zero in 'otc' but 1 in 'tm', so we have to jump to the next CSR is again 1 but all of the columns ('otc' 'tm' 'lease' 'maint') are in zero so we remove the row, and continue in this way until the last CSR...

I think that could work but i'm having some problems to implement that, or maybe any of you have a better idea. Thanks and sorry for bad english

Answer

You can also extract the four columns that you are interested and count how many zeros it has for each row and create logical vector for indexing:

df[(df[['otc', 'tm', 'lease', 'maint']] == 0).sum(axis = 1) < 4]

#  csr   id  ac  otc    tm  lease   maint
# 1  1  543   a    0     1      1       0
# 3  2  123   w    1     1      1       1
# 6  4  258   k    1     1      1       1
Comments