Ajay H - 1 year ago 66
Python Question

# sklearn's predict is predicting values out of range

In the program, I am scanning a number of brain samples taken in a time series of 40 x 64 x 64 images every 2.5 seconds. The number of 'voxels' (3D pixels) in each image is thus ~ 168,000 ish (40 * 64 * 64), each of which is a 'feature' for an image sample.

I thought of using Principle Component Analysis (PCA) because of the rediculously high n to perform dimensionality reduction. Then follow this up with Recursive Feature Elimination (RFE).

There are 9 classes to predict. Thus a multi class classification problem. Below, I convert this 9-class classification to a binary classification problem and store the models in a list models.

``````models = []
model_count = 0

for i in range(0,DS.nClasses):
for j in range(i+1,DS.nClasses):

binary_subset = sample_classes[i] + sample_classes[j]

print 'length of combined = %d' % len(binary_subset)
X,y = zip(*binary_subset)
print 'y = ',y

estimator = SVR(kernel="linear")
rfe = RFE(estimator , step=0.05)
rfe = rfe.fit(X, y)

#save the model
models.append(rfe)
model_count = model_count + 1
print '%d model fitting complete!' % model_count
``````

Now loop through these models and make predictions.

``````predictions = []
for X,y in test_samples:

for mod in models:
#X = mod.transform(X)
label = mod.predict(X.reshape(1,-1)) #Something goes wrong here

print 'label is type',type(label),' and value ',label

predictions.append(prediction)
print "We predicted %d , actual is %d" % (prediction,y)
``````

the labels should be numbers from 0-8 indicating the 9 possible outcomes. I'm printing the label values and this is what I get :

``````label is type <type 'numpy.ndarray'>  and value  [ 0.87011103]
label is type <type 'numpy.ndarray'>  and value  [ 2.09093105]
label is type <type 'numpy.ndarray'>  and value  [ 1.96046739]
label is type <type 'numpy.ndarray'>  and value  [ 2.73343935]
label is type <type 'numpy.ndarray'>  and value  [ 3.60415663]
label is type <type 'numpy.ndarray'>  and value  [ 6.10577602]
label is type <type 'numpy.ndarray'>  and value  [ 6.49922691]
label is type <type 'numpy.ndarray'>  and value  [ 8.35338294]
label is type <type 'numpy.ndarray'>  and value  [ 1.29765466]
label is type <type 'numpy.ndarray'>  and value  [ 1.60883217]
label is type <type 'numpy.ndarray'>  and value  [ 2.03839272]
label is type <type 'numpy.ndarray'>  and value  [ 2.03794106]
label is type <type 'numpy.ndarray'>  and value  [ 2.58830013]
label is type <type 'numpy.ndarray'>  and value  [ 3.28811133]
label is type <type 'numpy.ndarray'>  and value  [ 4.79660621]
label is type <type 'numpy.ndarray'>  and value  [ 2.57755697]
label is type <type 'numpy.ndarray'>  and value  [ 2.72263461]
label is type <type 'numpy.ndarray'>  and value  [ 2.58129428]
label is type <type 'numpy.ndarray'>  and value  [ 3.96296151]
label is type <type 'numpy.ndarray'>  and value  [ 4.80280219]
label is type <type 'numpy.ndarray'>  and value  [ 7.01768046]
label is type <type 'numpy.ndarray'>  and value  [ 3.3720926]
label is type <type 'numpy.ndarray'>  and value  [ 3.67517869]
label is type <type 'numpy.ndarray'>  and value  [ 4.52089242]
label is type <type 'numpy.ndarray'>  and value  [ 4.83746684]
label is type <type 'numpy.ndarray'>  and value  [ 6.76557315]
label is type <type 'numpy.ndarray'>  and value  [ 4.606097]
label is type <type 'numpy.ndarray'>  and value  [ 6.00243346]
label is type <type 'numpy.ndarray'>  and value  [ 6.59194317]
label is type <type 'numpy.ndarray'>  and value  [ 7.63559593]
label is type <type 'numpy.ndarray'>  and value  [ 5.8116106]
label is type <type 'numpy.ndarray'>  and value  [ 6.37096926]
label is type <type 'numpy.ndarray'>  and value  [ 7.57033285]
label is type <type 'numpy.ndarray'>  and value  [ 6.29465433]
label is type <type 'numpy.ndarray'>  and value  [ 7.91623641]
label is type <type 'numpy.ndarray'>  and value  [ 7.79524801]
Votes Array =  [ 1.  3.  8.  5.  5.  1.  7.  5.  1.]
We predicted 2 , actual is 8
``````

I don't get why the label values are floating point numbers. They should be numbers from 0-8.

I loaded the data correctly. Something goes wrong while executing
`predict()`
But I still can't find out what.