mobcity zkore mobcity zkore - 3 months ago 9
Python Question

How to write python script to read two excel worksheets and delete any repeats in one if both have values?

I have 2 worksheets in excel. They both contain 3 columns a,b and c. I need to delete any row in worksheet 1 if the data items for columns a,b,c are the same between the two worksheets. How would I do this using the Pandas python library?

import pandas as pd
ws1 = pd.read_excel(pathname/worksheet1.xlsx)
ws2 = pd.read_excel(pathname/worksheet2.xlsx)


Basically
worksheet1
looks something like this (dummy numbers assume they're different in actual data):

a b c d e f
1 2 3 4 4 4
1 2 3 4 4 4
1 2 3 4 4 4
1 2 3 4 4 4
1 2 3 4 4 4


worksheet2
looks something like this:

a b f d e c
1 2 4 4 4 3
1 2 4 4 4 3
1 2 4 4 4 3
1 2 4 4 4 3
1 2 4 4 4 3


I have to check columns a,b and c in
worksheet1
and if the same data shows up in
worksheet2
, I would delete that row in
worksheet1
.

For example, in
worksheet1
the values 1,2 and 3 are returned for columns a,b and c. I need to check if 1,2 and 3 show up in columns a,b and c in
worksheet2
(located differently). If they do show up in
worksheet2
, I need to delete the row in
worksheet1
with the values 1,2 and 3.

Answer

Try this (assuming that worksheets list1 and list 2 - two separate excel files):

df1 = pd.read_excel('/path/to/file_name1.xlsx')
df2 = pd.read_excel('/path/to/file_name2.xlsx')

df1 = df1[~df1.email.isin(df2.email)]

The third line of code removes those rows from df1 which are found in the df2 (assuming that the column name is email in both DFs)