Gabriel - 13 days ago 9
Python Question

# Random Forest with bootstrap = False in scikit-learn python

What does RandomForestClassifier() do if we choose bootstrap = False?

According to the definition in this link

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier

bootstrap : boolean, optional (default=True) Whether bootstrap samples
are used when building trees.

Asking this because I want to use a Random Forest approach to a time series, so train with a rolling window of size (t-n) and predict date (t+k) and wanted to know if this is what would happen if we choose True or False:

1) If
`Bootstrap = True`
, so when training samples can be of any day and of any number of features. So for example can have samples from day (t-15), day (t-19) and day (t-35) each one with randomly chosen features and then predict the output of date (t+1).

2) If
`Bootstrap = False`
, its going to use all the samples and all the features from date (t-n) to t, to train, so its actually going to respect the dates order (meaning its going to use t-35, t-34, t-33... etc until t-1). And then will predict output of date (t+1).

If this is how Bootstrap works I would be inclined to use Boostrap = False, as if not it would be a bit strange (think of financial series) to just ignore the consecutive days returns and jump from day t-39 to t-19 and then to day t-15 to predict day t+1. We would be missing all the info between those days.

So... is this how Bootstrap works?

The benefit of random forests comes from its creating a large variety of trees by sampling both observations and features. `Bootstrap = False` is telling it to sample observations with or without replacement - it should still sample when it's False, just without replacement.
You tell it what share of features you want to sample by setting `max_features`, either to a share of the features or just an integer number (and this is something that you would typically tune to find the best parameter for).