zkurtz zkurtz - 4 months ago 62
Python Question

Output something other than '0 pruned nodes'

Every time I've used

(not only with python), the training messages always include "0 pruned nodes" on each line. For example:

import pandas as pd
from sklearn import datasets
import xgboost as xgb
iris = datasets.load_iris()
dtrain = xgb.DMatrix(iris.data, label = iris.target)
params = {'max_depth': 10, 'min_child_weight': 0, 'gamma': 0, 'lambda': 0, 'alpha': 0}
bst = xgb.train(params, dtrain)

The output includes a long list of statements like

[11:08:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 0 pruned nodes, max_depth=5

I've played with several combinations of tuning parameters but I always get this "0 pruned nodes" message. How can I generate a situation where I get some pruned nodes?


You will have pruned nodes using regularization! Use the gammaparameter!

The objective functions contains two parts: training loss and regularization. The regularisation in XGBoost is controlled by three parameters: alpha, beta and gamma (doc):

alpha [default=0] L1 regularization term on weights, increase this value will make model more conservative.

lambda [default=1] L2 regularization term on weights, increase this value will make model more conservative.

gamma [default=0] minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. range: [0,∞]

alpha and beta are just L1 and L2 penalties on the weights and should not affect pruning.

BUT gamma is THE parameter to tune to get pruned nodes. You should increase it to get pruned nodes. Watch out that it is dependent of the objective function and that it could require value as high as 10000 or more to obtain pruned nodes. Tuning gamma is great! it will make XGBoost to converge! meaning that after a certain number of iterations the training and testing score will not change in the following iterations (all the nodes of the new trees will be pruned). At the end it is a great switch to control overfit!

See Introduction to Boosted Trees to get the exact definition of gamma.