zkurtz - 10 months ago 111

Python Question

Every time I've used

`xgboost`

`import pandas as pd`

from sklearn import datasets

import xgboost as xgb

iris = datasets.load_iris()

dtrain = xgb.DMatrix(iris.data, label = iris.target)

params = {'max_depth': 10, 'min_child_weight': 0, 'gamma': 0, 'lambda': 0, 'alpha': 0}

bst = xgb.train(params, dtrain)

The output includes a long list of statements like

`[11:08:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 0 pruned nodes, max_depth=5`

I've played with several combinations of tuning parameters but I always get this "0 pruned nodes" message. How can I generate a situation where I get some pruned nodes?

Answer

You will have pruned nodes using **regularization**! Use the `gamma`

parameter!

The objective functions contains two parts: training loss and regularization.
The regularisation in XGBoost is controlled by three parameters: `alpha`

, `beta`

and `gamma`

(doc):

alpha [default=0] L1 regularization term on weights, increase this value will make model more conservative.

lambda [default=1] L2 regularization term on weights, increase this value will make model more conservative.

gamma [default=0] minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. range: [0,∞]

`alpha`

and `beta`

are just L1 and L2 penalties on the weights and should not affect pruning.

BUT `gamma`

is THE parameter to tune to get pruned nodes. You should increase it to get pruned nodes. Watch out that it is dependent of the objective function and that it could require value as high as 10000 or more to obtain pruned nodes. Tuning gamma is great! it will make XGBoost to converge! meaning that after a certain number of iterations the training and testing score will not change in the following iterations (all the nodes of the new trees will be pruned). At the end it is a great switch to control overfit!

See Introduction to Boosted Trees to get the exact definition of `gamma`

.

Source (Stackoverflow)