Chris Parry - 1 year ago 127

Python Question

I have been studying this example of stacking. In this case, each set of K-folds produces one column of data, and this is repeated for each classifier. I.e: the matrices for blending are:

`dataset_blend_train = np.zeros((X.shape[0], len(clfs)))`

dataset_blend_test = np.zeros((X_submission.shape[0], len(clfs)))

I need to stack predictions from a multiclass problem (probs 15 different classes per sample). This will produce an n*15 matrix for each clf.

Should these matrices just be concatenated horizontally? Or should they be combined in some other way, before logistic regression is applied? Thanks.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

You can adapt the code to the multi-class problem in two ways:

- Concatenate horizontally the probabilities, that is you will need to create:
`dataset_blend_train = np.zeros((X.shape[0], len(clfs)*numOfClasses))`

`dataset_blend_test = np.zeros((X_submission.shape[0], len(clfs)*numOfClasses))`

- Instead of using probabilities, use the class prediction for the base models. That way you keep the arrays the same size, but instead of
`predict_proba`

you just use`predict`

.

I have used both successfully, but which works better may depend on the dataset.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**