equanimity equanimity - 2 months ago 8
Python Question

Unable to subset Pandas dataframe

I have the following data in a data frame named

in_file
:

Client Value_01 Value_02 Date
ABC 100 500 2016-09-01T
ABC 14 90 2016-09-02T
DEF 95 1000 2016-09-01T
DEF 200 600 2016-09-02T
GHI 75 19 2016-09-01T
GHI 300 700 2016-09-02T
JKL 50 02 2016-09-01T
JKL 400 800 2016-09-02T


I subset the data frame with the following (which we'll call 'subset 1'):

df_01 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client <> 'ABC') & (in_file.Client <> 'DEF')].sort_values('Value_01', ascending=False)


and I get back:

Client Value_01 Value_02 Date
GHI 75 19 2016-09-01T
JKL 50 02 2016-09-01T


Then, I attempt to subset the data frame with the following (which we'll call 'subset 2'):

df_02 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client == 'ABC') & (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)


With 'subset 2', I get back an empty data frame. But, I was expecting to see the following:

Client Value_01 Value_02 Date
ABC 100 500 2016-09-01T
DEF 95 1000 2016-09-01T


Does anyone know why the 'subset 2' code is not returning the data frame that I expect?

Thanks in advance.

Answer

including isin():

In [28]: in_file.loc[(in_file.Date == '2016-09-01T') & in_file.Client.isin(['ABC', 'DEF'])].sort_values('Value_01', ascending=False)
Out[28]:
  Client  Value_01  Value_02         Date
0    ABC       100       500  2016-09-01T
2    DEF        95      1000  2016-09-01T

excluding:

In [29]: in_file.loc[(in_file.Date == '2016-09-01T') & (~in_file.Client.isin(['ABC', 'DEF']))].sort_values('Value_01', ascending=False)
Out[29]:
  Client  Value_01  Value_02         Date
4    GHI        75        19  2016-09-01T
6    JKL        50         2  2016-09-01T

Or bit slower, but much nicer query() method:

In [30]: in_file.query("Date == '2016-09-01T' and Client in ['ABC', 'DEF']")
Out[30]:
  Client  Value_01  Value_02         Date
0    ABC       100       500  2016-09-01T
2    DEF        95      1000  2016-09-01T

In [31]: in_file.query("Date == '2016-09-01T' and Client not in ['ABC', 'DEF']")
Out[31]:
  Client  Value_01  Value_02         Date
4    GHI        75        19  2016-09-01T
6    JKL        50         2  2016-09-01T