Apratim Bhattacharya Apratim Bhattacharya - 1 month ago 16
Python Question

Reindexing data frame Pandas

I am trying to split a data set for training and testing using Pandas.

data = pd.read_csv("housingdata.csv", header=None)
train = testing.sample(frac=0.6)
train.reindex()
test = testing.loc[~testing.index.isin(train.index)]
print train
print test


when I print the data, I get

0 1 2 3 4
9 0.17004 12.5 7.87 0 0.524
1 0.02731 0.0 7.07 0 0.469
5 0.02985 0.0 2.18 0 0.458
3 0.03237 0.0 2.18 0 0.458
7 0.14455 12.5 7.87 0 0.524
6 0.08829 12.5 7.87 0 0.524

0 1 2 3 4
0 0.00632 18.0 2.31 0 0.538
2 0.02729 0.0 7.07 0 0.469
4 0.06905 0.0 2.18 0 0.458
8 0.21124 12.5 7.87 0 0.524


As noticed, the row indices are re-shuffled. How to re-index the rows in both the data sets?

This however does not change global settings. Eg.,

train.iloc[0,4]


gives 0.524

Answer

As @EdChum's comments point out, it's not exactly clear what behavior you're looking for. But if all you want to do is to give both new dataframes indices going from 0, 1, 2 ... n then you can use reset_index():

train.reset_index(inplace=True, drop=True)
test.reset_index(inplace=True, drop=True)