GKS - 7 months ago 24

Python Question

I have a set of features and labels for 6 different weeks stored in variable

`FEATURES_DATA`

`TARGET`

What I want to do is to train a decision tree on growing features and labels. So, training on first week of data and testing on second week, then, training on first two weeks and testing on third week and so on...

To give an idea about my dataset:

`print np.asarray(FEATURES_DATA).shape`

print np.asarray(FEATURES_DATA[0][0]).shape

print ""

print FEATURES_DATA[0]

outputs:

`(6L, 1L)`

(463511L, 40L)

[ array([[3, 3, 3, ..., 7, 7, 7],

[3, 3, 3, ..., 7, 7, 7],

[3, 3, 3, ..., 7, 7, 7],

...,

[2, 2, 2, ..., 6, 6, 6],

[2, 2, 2, ..., 6, 6, 6],

[2, 2, 2, ..., 6, 6, 6]], dtype=uint8)]

Here is the main code:

`from sklearn import tree`

from sklearn.tree import DecisionTreeClassifier

features = np.asarray(FEATURES_DATA)

labels = np.asarray(TARGET)

for i in xrange(5):

Xtrain = np.concatenate(features[:i][0])

print Xtrain.shape

Ytrain = np.concatenate(labels[:i][0])

Xtest = FEATURES_DATA[i+1][0]

Ytest = TARGET[i+1][0]

clf_DT = DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=5000)

clf_DT.fit(Xtrain, Ytrain)

I get the following error on

`Xtrain`

`---------------------------------------------------------------------------`

IndexError Traceback (most recent call last)

<ipython-input-5-5d87466a6a03> in <module>()

6

7 for i in xrange(5):

----> 8 Xtrain = np.concatenate(features[:i][0])

9 print Xtrain.shape

10 Ytrain = np.concatenate(labels[:i][0])

IndexError: index 0 is out of bounds for axis 0 with size 0

Any help? Thanks

Answer

I got the solution to my problem. Initializing an empty matrix will solve the problem.

```
Xtrain=np.empty(shape=[0, 40])
for i in xrange(5):
Xtrain=np.concatenate((Xtrain,FEATURES_DATA[i][0]))
print Xtrain.shape
```

which gives the output

```
(463511L, 40L)
(955280L, 40L)
(1502984L, 40L)
(1969719L, 40L)
(2569141L, 40L)
```