Joe_ft Joe_ft - 24 days ago 6
Python Question

keep rows that have data in list of columns python

How can I select rows that contain data in a specific list of columns and drop the ones that have no data at all in those specific columns?

This is the code that I have so far:

VC_sub_selection = final[final['VC'].isin(['ACTIVE', 'SILENT']) & final['Status'].isin(['Test'])]

data_usage_months = list(data_usage_res.columns)


This is an example of the data set

item VC Status Jun 2016 Jul 2016
1 Active Test Nan 1.0
2 Silent Test Nan Nan
3 Active Test 2.0 3.0
4 Silent Test 5.0 Nan


What I would like to achieve is that item 1,3,4 will stay in the data set and that item 2 will be deleted. So the condition that applies is: if all months are
Nan
than drop row.

Thank you,

Jeroen

Answer

Though Nickil's solution answers the question, it does not take into account that more date columns may be added later. Hence, using the index position of a column might not be sufficient in future situations.

The solution presented below does not use the index, rather it uses a regex to find the date columns:

import pandas as pd
import re

# item    VC     Status     Jun 2016   Jul 2016
# 1       Active Test       Nan        1.0
# 2       Silent Test       Nan        Nan
# 3       Active Test       2.0        3.0
# 4       Silent Test       5.0        Nan

df = pd.DataFrame({'item': [1,2,3,4],
                   'VC': ['Active', 'Silent', 'Active', 'Silent'],
                   'Status': ['Test'] * 4,
                   'Jun 2016': [None, None, 2.0, 5.0],
                   'Jul 2016': [1.0, None, 3.0, None]})

regex_pattern = r'[a-zA-Z]{3}\s\d{4}'

date_cols = list(filter(lambda x: re.search(regex_pattern, x), df.columns.tolist()))

df_res = df.dropna(subset=date_cols, how='all')

#     Jul 2016  Jun 2016 Status      VC  item
# 0       1.0       NaN   Test  Active     1
# 2       3.0       2.0   Test  Active     3
# 3       NaN       5.0   Test  Silent     4