user3447653 user3447653 - 5 months ago 11
Python Question

Filter a pandas dataframe based on a derived field

I have the below query that I execute and store in a pandas dataframe.

SELECT
mn as MachineName,
dt as DateTime,
REGEXP_EXTRACT(path, 'Name:([\\s\\S\\w\\W]*?)Domain:') AS Name1,
FROM TABLE DataGallery


Output of the query is:

MachineName DateTime Name1
GOG1 2016-12-13 12:14 PI1
GOG1 2016-12-14 13:12 PI2


I am trying to filter the dataframe based on df["Name1"] = "PI1". But this does not seem to work out. I tried the same for direct fields from table like "MachineName" and "DateTime", it works perfectly fine.

Not sure whether this filtering will not work for derived fields based on REGEXP_EXTRACT. Any help would be appreciated.

Answer

IIUC use boolean indexing if need filter df:

df = pd.DataFrame({'Name1': {0: 'PI1', 1: 'PI2'}, 
                   'DateTime': {0: '2016-12-13 12:14', 1: '2016-12-14 13:12'}, 
                   'MachineName': {0: 'GOG1', 1: 'GOG1'}}
                   ,columns=['MachineName','DateTime','Name1'])

print (df)
  MachineName          DateTime Name1
0        GOG1  2016-12-13 12:14   PI1
1        GOG1  2016-12-14 13:12   PI2

print (df.Name1 == 'PI1')
0     True
1    False
Name: Name1, dtype: bool

print (df[df.Name1 == 'PI1'])
  MachineName          DateTime Name1
0        GOG1  2016-12-13 12:14   PI1