FunnyChef FunnyChef - 1 month ago 8
Python Question

Extract only certain rows using value_coutns

I have simple code that reads a CSV file and gives me a list of

value_counts
but I want to extract only certain rows from the
value_counts
results. Any suggestions how to do this?

value_count results look like:

domain_a.com 79
domain_b.de 51
domain_c.de 44
domain_d.com 43
domain_e.com 38


What I would like to be able to do is search through the results and return only those rows that match a certain domain name:

desired results:

domain_a.com 79
domain_c.de 44
domain_e.com 38


Code so far:

import pandas as pd

# read csv into the data dataframe
allData = r'/downloads/data/latest/export-2016-09-30-2039-55502fd6.csv'
tickets_df = pd.read_csv((allData),parse_dates=['Created at'],index_col='Created at')
tickets_df.fillna(0,inplace=True)

# Use 2016 data ony
tickets_2016_df = (tickets_df.loc['2016-01-01':'2016-10-20'])

org_counts = tickets_2016_df['Requester domain'].value_counts()
print (org_counts)

Answer

you can convert your series to DataFrame and then use .query() method:

In [120]: org_counts
Out[120]:
domain_a.com    79
domain_b.de     51
domain_c.de     44
domain_d.com    43
domain_e.com    38
Name: val, dtype: int64

In [121]: org_counts.to_frame('count').query("index in ['domain_a.com','domain_c.de','domain_e.com']")
Out[121]:
              count
domain_a.com     79
domain_c.de      44
domain_e.com     38

or using Index.isin() method and boolean indexing:

In [122]: domains = ['domain_a.com','domain_c.de','domain_e.com']

In [123]: org_counts[org_counts.index.isin(domains)]
Out[123]:
domain_a.com    79
domain_c.de     44
domain_e.com    38
Name: val, dtype: int64