I have a training dataframe that has been cleaned and has a subset of variables that the original test dataframe had. I'd like to create a new test dataframe that retains only the columns the training dataframe has.
DataFrame has columns with the same names, then you can simply select the columns from the test
DataFrame using the
DataFrame.columns property of the training
DataFrame and the
Here is a working example:
$ train = pd.DataFrame([[0,1,2,3]],columns=['A','D','E','G']) $ train A D E G 0 0 1 2 3 $ test = pd.DataFrame([[0,1,2,3,4,5,6]],columns=['A','B','C','D','E','F','G']) $ test A B C D E F G 0 0 1 2 3 4 5 6 $ test_df = test[train.columns] $ test_df A D E G 0 0 3 4 6