Nizag Nizag -4 years ago 124
Python Question

Preserving Cases while Column Matching from List in Pandas

I have a dataframe with strings with varying cases for column names as well as a list of lowercase strings.

a = [5, 5, 4, 6]
b = [4, 4, 4, 4]
c = [6, 8, 2, 3]
d = [8, 6, 4, 3]

df = pd.DataFrame({'a': a,
'B_B': b,
'CC_Cc': c,
'Dd_DdDd': d})

cols = ['b_b', 'cc_cc', 'dd_dddd']


I want to select the columns in df that match the strings in cols while preserving the cases of the columns in df. I've been able to match the column names by making them all lowercase, but I'm not sure how save the original cases of the dataframe columns.

In this case I would want to create a new dataframe with only the columns in df from keep cols, but with their original cases. How would I go about doing this?

Desired output:

B_B CC_Cc Dd_DdDd
0 4 6 8
1 4 8 6
2 4 2 4
3 4 3 3

Answer Source

You can use str.lower() to convert the column names to lower case, then construct a logical series with isin method to select the columns; the column names will not be altered in this way:

df.loc[:, df.columns.str.lower().isin(cols)]

enter image description here

An alternative method would be to use filter function, in regex specify a modifier (?i) to ignore case:

df.filter(regex="(?i)" + "|".join(cols))

enter image description here

Notice this regex method also matches column names that contain the pattern in cols list, if you want a exact match ignoring cases, you can add word boundaries in:

df.filter(regex="(?i)\\b"+"\\b|\\b".join(cols)+"\\b")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download