yogastar123 yogastar123 - 1 year ago 112
Python Question

Pandas select test dataframe columns using training dataframe columns

I have a training dataframe that has been cleaned and has a subset of variables that the original test dataframe had. I'd like to create a new test dataframe that retains only the columns the training dataframe has.

For example,



How do I create new test_df that keeps only train.columns?

Answer Source

Assuming each DataFrame has columns with the same names, then you can simply select the columns from the test DataFrame using the DataFrame.columns property of the training DataFrame and the [] syntax.

Here is a working example:

$ train = pd.DataFrame([[0,1,2,3]],columns=['A','D','E','G'])
$ train
   A  D  E  G
0  0  1  2  3

$ test = pd.DataFrame([[0,1,2,3,4,5,6]],columns=['A','B','C','D','E','F','G'])
$ test
   A  B  C  D  E  F  G
0  0  1  2  3  4  5  6

$ test_df = test[train.columns]
$ test_df
   A  D  E  G
0  0  3  4  6
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download