Amelio Vazquez-Reina Amelio Vazquez-Reina - 9 months ago 130
Python Question

Selecting/Excluding sets of columns in Pandas

I would like to create views or dataframes from an existing dataframe based on column selections.

For example, I would like to create a dataframe df2 from a dataframe df1 that holds all columns from it except two of them. I tried doing the following, but it didn't work:

import numpy as np
import pandas as pd

# Create a dataframe with columns A,B,C and D
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

# Try to create a second dataframe df2 from df with all columns except 'B' and D
my_cols = set(df.columns)

# This returns an error ("unhashable type: set")
df2 = df[my_cols]

What am I doing wrong? Perhaps more generally, what mechanisms does Panda have to support the picking and exclusions of arbitrary sets of columns from a dataframe?


You can either Drop the columns you do not need OR Select the ones you need

    ##Using DataFrame.drop
    df.drop(df.columns[[1, 2]], axis=1, inplace=True)

    # drop by Name
    df1 = df1.drop(['B', 'C'], axis=1)

    ## Select the ones you want
    df1 = df[['a','d']]