GKS GKS - 4 months ago 6
Python Question

Error in appending matrices in python

I have a set of features and labels for 6 different weeks stored in variable

FEATURES_DATA
and
TARGET
respectively.

What I want to do is to train a decision tree on growing features and labels. So, training on first week of data and testing on second week, then, training on first two weeks and testing on third week and so on...

To give an idea about my dataset:

print np.asarray(FEATURES_DATA).shape
print np.asarray(FEATURES_DATA[0][0]).shape
print ""
print FEATURES_DATA[0]


outputs:

(6L, 1L)
(463511L, 40L)

[ array([[3, 3, 3, ..., 7, 7, 7],
[3, 3, 3, ..., 7, 7, 7],
[3, 3, 3, ..., 7, 7, 7],
...,
[2, 2, 2, ..., 6, 6, 6],
[2, 2, 2, ..., 6, 6, 6],
[2, 2, 2, ..., 6, 6, 6]], dtype=uint8)]


Here is the main code:

from sklearn import tree
from sklearn.tree import DecisionTreeClassifier

features = np.asarray(FEATURES_DATA)
labels = np.asarray(TARGET)
for i in xrange(5):
Xtrain = np.concatenate(features[:i][0])
print Xtrain.shape
Ytrain = np.concatenate(labels[:i][0])
Xtest = FEATURES_DATA[i+1][0]
Ytest = TARGET[i+1][0]
clf_DT = DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=5000)
clf_DT.fit(Xtrain, Ytrain)


I get the following error on
Xtrain
concatenation line:

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-5-5d87466a6a03> in <module>()
6
7 for i in xrange(5):
----> 8 Xtrain = np.concatenate(features[:i][0])
9 print Xtrain.shape
10 Ytrain = np.concatenate(labels[:i][0])

IndexError: index 0 is out of bounds for axis 0 with size 0


Any help? Thanks

GKS GKS
Answer

I got the solution to my problem. Initializing an empty matrix will solve the problem.

Xtrain=np.empty(shape=[0, 40])
for i in xrange(5):
    Xtrain=np.concatenate((Xtrain,FEATURES_DATA[i][0]))
    print Xtrain.shape

which gives the output

(463511L, 40L)
(955280L, 40L)
(1502984L, 40L)
(1969719L, 40L)
(2569141L, 40L)