yogastar123 yogastar123 - 19 days ago 7
Python Question

Pandas select test dataframe columns using training dataframe columns

I have a training dataframe that has been cleaned and has a subset of variables that the original test dataframe had. I'd like to create a new test dataframe that retains only the columns the training dataframe has.

For example,

train.columns=['A','D','E','G']

test.columns=['A','B','C','D','E','F','G']


How do I create new test_df that keeps only train.columns?

Answer

Assuming each DataFrame has columns with the same names, then you can simply select the columns from the test DataFrame using the DataFrame.columns property of the training DataFrame and the [] syntax.

Here is a working example:

$ train = pd.DataFrame([[0,1,2,3]],columns=['A','D','E','G'])
$ train
   A  D  E  G
0  0  1  2  3

$ test = pd.DataFrame([[0,1,2,3,4,5,6]],columns=['A','B','C','D','E','F','G'])
$ test
   A  B  C  D  E  F  G
0  0  1  2  3  4  5  6

$ test_df = test[train.columns]
$ test_df
   A  D  E  G
0  0  3  4  6