Ajay H Ajay H - 1 month ago 11
Python Question

sklearn pipeline model predicting same results for all input

In the program, I am scanning a number of brain samples taken in a time series of 40 x 64 x 64 images every 2.5 seconds. The number of 'voxels' (3D pixels) in each image is thus ~ 168,000 ish (40 * 64 * 64), each of which is a 'feature' for an image sample.

I thought of using Recursive Feature Elimination (RFE). Then follow this up with Principle Component Analysis (PCA) because of the rediculously high n to perform dimensionality reduction.

There are 9 classes to predict. Thus a multi class classification problem. Starting with RFE:

estimator = SVC(kernel='linear')
rfe = RFE(estimator,n_features_to_select= 20000, step=0.05)
rfe = rfe.fit(X_train,y_train)
X_best = rfe.transform(X_train)


Now perform PCA :

X_best = scale(X_best)

def get_optimal_number_of_components():
cov = np.dot(X_best,X_best.transpose())/float(X_best.shape[0])
U,s,v = svd(cov)
print 'Shape of S = ',s.shape

S_nn = sum(s)

for num_components in range(0,s.shape[0]):
temp_s = s[0:num_components]
S_ii = sum(temp_s)
if (1 - S_ii/float(S_nn)) <= 0.01:
return num_components

return s.shape[0]

n_comp = get_optimal_number_of_components()
print 'optimal number of components = ', n_comp

pca = PCA(n_components = n_comp)
pca = pca.fit(X_best)
X_pca_reduced = pca.transform(X_best)


Train the reduced component dataset with SVM

svm = SVC(kernel='linear',C=1,gamma=0.0001)
svm = svm.fit(X_pca_reduced,y_train)


Now transform the training set to RFE-PCA reduced and make the predictions

X_rfe = rfe.transform(X_test)
X_pca = pca.transform(X_rfe)

predictions = svm.predict(X_pca)

print 'predictions = ',predictions
print 'actual = ',y_test


But I always get the same values for prediction for every sample!

predictions = [2 2 2 2 2 2 2 2 2 2 2 2 2] #Why is it the same?!!
actual = actual = [[0]
[0]
[6]
[8]
[4]
[5]
[0]
[6]
[2]
[3]
[0]
[5]
[6]]


I made sure to use linear kernel. I also considered modifying C,gamma with values like (1,0.001) (1,0.0001) (10, 0.001), (10, 0.0001) ... but I get the same output.

Is there something I'm missing here?

EDIT 1

I executed

print svm.decision_function(X_pca)


and the output was :

[[ -4.79479982e+02 -8.01563453e+02 -9.91453849e+02 -1.34641884e+02
-1.02315530e+03 -2.88297991e+02 -8.41843812e+02 4.79807826e+02
-3.50485820e+02 -2.31081776e+02 -1.42555136e+02 -4.79034448e+02
-6.93029988e+01 -7.34288793e+02 -5.49271317e+01 1.98108304e+02
4.80257991e+02 2.46835670e+02 2.90045437e+02 1.53261114e+02
9.15742824e+02 -1.28387833e+01 -3.05045240e+02 -2.19268988e+01
-2.24896384e+02 1.44501465e+02 -9.17438352e+01 -6.96148972e+01
-1.15785658e+02 9.53878230e+01 1.79823987e+01 8.05242433e+01
8.33917960e+02 -1.69686889e+01 9.85949158e+01 2.68935397e+02]
[ -4.62804973e+02 -8.26454112e+02 -9.83214857e+02 -1.43367127e+02
-1.03538041e+03 -2.86397664e+02 -8.47539241e+02 4.63709033e+02
-3.52018380e+02 -2.49936725e+02 -1.43734219e+02 -4.79498907e+02
-6.93338619e+01 -7.51141272e+02 -5.30999658e+01 1.95687050e+02
4.69206888e+02 2.46530774e+02 2.92047409e+02 1.47934614e+02
9.27901865e+02 -1.21801344e+01 -2.99530129e+02 -2.03238750e+01
-2.26862390e+02 1.47692745e+02 -8.81396485e+01 -6.41692405e+01
-1.14247569e+02 1.01567350e+02 1.87874038e+01 6.90549126e+01
8.41984280e+02 -2.04488188e+01 1.00839951e+02 2.75459577e+02]
[ -4.49255185e+02 -7.89243063e+02 -9.78820180e+02 -1.29050171e+02
-1.01784356e+03 -3.23431625e+02 -7.98795796e+02 4.93058279e+02
-3.64674793e+02 -2.46545700e+02 -1.66933546e+02 -4.84571326e+02
-9.93316258e+01 -7.36182373e+02 -6.23110881e+01 2.08061873e+02
4.28119725e+02 2.75927668e+02 2.36425246e+02 1.69950273e+02
9.50488041e+02 -3.17986619e+01 -3.03656967e+02 -4.78710028e+01
-2.20752797e+02 1.36973850e+02 -5.31583763e+01 -1.08205173e+02
-7.94698530e+01 1.37320498e+02 -2.31183352e+01 8.41399154e+01
8.26408412e+02 1.30471236e+01 1.48266050e+02 2.55914495e+02]
[ -4.80424764e+02 -8.07660826e+02 -9.91911478e+02 -1.35981428e+02
-1.02923114e+03 -2.93372818e+02 -8.47420541e+02 4.60149182e+02
-3.48333176e+02 -2.37654055e+02 -1.39277819e+02 -4.78486235e+02
-6.83571401e+01 -7.34632739e+02 -5.73953318e+01 1.95508198e+02
4.80569807e+02 2.37500896e+02 2.89038289e+02 1.49855773e+02
9.09217973e+02 -1.04236971e+01 -3.02128880e+02 -2.16485093e+01
-2.23313869e+02 1.43686084e+02 -9.74071814e+01 -7.22417410e+01
-1.19091495e+02 8.94390723e+01 1.97000084e+01 8.08496457e+01
8.39105553e+02 -1.82282013e+01 9.82685256e+01 2.67791421e+02]
[ -4.74406707e+02 -8.18308535e+02 -9.76126419e+02 -1.74849565e+02
-1.02784293e+03 -2.96842934e+02 -8.42749406e+02 4.83769137e+02
-3.59483221e+02 -2.24264385e+02 -1.61995143e+02 -4.78030614e+02
-8.02309023e+01 -7.54316452e+02 -5.43436450e+01 2.05876768e+02
4.33470519e+02 2.67598191e+02 2.75764466e+02 1.53323191e+02
9.45967383e+02 -2.93192233e+01 -3.04615693e+02 -3.20731950e+01
-2.42783848e+02 1.40891844e+02 -6.13739832e+01 -6.15060481e+01
-9.51924850e+01 1.35666499e+02 2.41364468e+00 6.39635318e+01
8.37881867e+02 -1.03313421e+01 1.19234038e+02 2.76305651e+02]
[ -4.84321668e+02 -8.07444080e+02 -1.01507160e+03 -1.28529685e+02
-1.05601843e+03 -2.99493242e+02 -8.41745493e+02 4.75608122e+02
-3.37295601e+02 -2.49242183e+02 -1.30463265e+02 -4.74284269e+02
-6.05670230e+01 -7.34447396e+02 -4.01117838e+01 1.80948824e+02
4.80450158e+02 2.19859113e+02 2.94798893e+02 1.35958067e+02
9.13259527e+02 -3.52105914e-01 -2.92301811e+02 -1.24432589e+01
-2.13204265e+02 1.64167920e+02 -1.02951065e+02 -7.04800774e+01
-1.31293866e+02 9.12032854e+01 2.67291593e+01 7.78485633e+01
8.74745197e+02 -2.50250734e+01 9.69993408e+01 2.83018293e+02]
[ -4.68184798e+02 -7.85221871e+02 -9.98980941e+02 -1.08799100e+02
-1.02080996e+03 -2.87470373e+02 -8.29552725e+02 4.99360929e+02
-3.31724034e+02 -2.56603688e+02 -1.24320652e+02 -4.60348857e+02
-6.21852802e+01 -7.31782526e+02 -2.56669989e+01 1.74050279e+02
4.74370392e+02 2.26812613e+02 2.78945379e+02 1.29667612e+02
9.21512986e+02 3.74936721e+00 -2.77509203e+02 -1.34603952e+01
-2.12032693e+02 1.72842580e+02 -9.71967056e+01 -8.19354011e+01
-1.32985460e+02 9.55148610e+01 1.66381043e+01 5.88073445e+01
8.62770538e+02 -2.37682031e+01 1.06714435e+02 2.94158166e+02]
[ -4.63681347e+02 -8.22291452e+02 -9.98021515e+02 -1.54810425e+02
-1.03372001e+03 -3.34322759e+02 -8.34407336e+02 4.71050572e+02
-3.69327864e+02 -2.40580250e+02 -1.65003310e+02 -4.88818830e+02
-9.73775374e+01 -7.51246204e+02 -6.69606962e+01 2.13573607e+02
4.49817824e+02 2.79532473e+02 2.41873397e+02 1.69963589e+02
9.53153717e+02 -2.88140674e+01 -3.13030733e+02 -4.54555034e+01
-2.32589565e+02 1.36869994e+02 -6.33773098e+01 -1.06164181e+02
-8.91557438e+01 1.24881490e+02 -1.94528381e+01 7.98035685e+01
8.22835959e+02 8.75642083e+00 1.43002335e+02 2.61562868e+02]
[ -4.77620825e+02 -8.40698094e+02 -1.01067455e+03 -1.56851274e+02
-1.05031578e+03 -3.14666532e+02 -8.46541414e+02 4.61714738e+02
-3.60822150e+02 -2.44485564e+02 -1.53420660e+02 -4.85710648e+02
-7.77752216e+01 -7.55747678e+02 -5.87745617e+01 2.04601581e+02
4.68781099e+02 2.63234873e+02 2.86306284e+02 1.58817281e+02
9.43249321e+02 -1.87631625e+01 -3.06321663e+02 -2.78828679e+01
-2.27554363e+02 1.46508283e+02 -7.88844807e+01 -7.41051812e+01
-1.05094485e+02 1.12231546e+02 7.97692231e+00 7.67304852e+01
8.43518403e+02 -1.12844915e+01 1.13370158e+02 2.70797472e+02]
[ -4.91420429e+02 -7.90722180e+02 -1.05615447e+03 -1.20351520e+02
-1.04098604e+03 -2.92426682e+02 -8.45105853e+02 4.78228854e+02
-3.10412377e+02 -2.77543578e+02 -1.09733119e+02 -4.40834428e+02
-4.35168704e+01 -7.29088994e+02 -6.64581241e+00 1.48560861e+02
4.74565890e+02 2.07485677e+02 2.99817382e+02 1.09936148e+02
9.03346951e+02 2.26102442e+01 -2.45854761e+02 8.31279855e+00
-1.92441568e+02 2.03079787e+02 -1.05267244e+02 -6.41835912e+01
-1.49582656e+02 8.73008441e+01 3.36913246e+01 5.11061286e+01
8.79159912e+02 -3.85152954e+01 9.08938445e+01 3.04037825e+02]
[ -4.85998114e+02 -7.83944995e+02 -9.68132304e+02 -1.54631678e+02
-1.01186983e+03 -2.80419560e+02 -8.72211797e+02 4.97352635e+02
-3.56256101e+02 -2.23204297e+02 -1.55355470e+02 -4.80882457e+02
-7.86287112e+01 -7.58318471e+02 -5.10727433e+01 2.08265151e+02
4.49457388e+02 2.65764723e+02 2.72435473e+02 1.53296624e+02
9.44654406e+02 -2.50922419e+01 -3.17539501e+02 -3.16241295e+01
-2.51387679e+02 1.38109115e+02 -6.97122491e+01 -6.59836763e+01
-1.03441764e+02 1.19472073e+02 3.60256872e+00 6.22040523e+01
8.19929661e+02 -1.26581261e+01 1.12555974e+02 2.80480600e+02]
[ -4.70876215e+02 -7.87431621e+02 -9.96007256e+02 -1.30872700e+02
-1.03175439e+03 -2.94238915e+02 -8.36753617e+02 4.77420371e+02
-3.38091939e+02 -2.44272006e+02 -1.35130348e+02 -4.72973924e+02
-6.19636207e+01 -7.37123284e+02 -4.28620473e+01 1.80929974e+02
4.67912162e+02 2.22731582e+02 2.93578369e+02 1.34101279e+02
9.04139841e+02 -3.91744880e+00 -2.88182153e+02 -1.22493089e+01
-2.15621705e+02 1.59580065e+02 -9.57584381e+01 -6.41773592e+01
-1.28168370e+02 9.42107498e+01 2.61332125e+01 7.00130475e+01
8.58092989e+02 -2.62818439e+01 9.40455319e+01 2.82505159e+02]
[ -4.70908104e+02 -8.29375323e+02 -9.93882131e+02 -1.47050049e+02
-1.03443155e+03 -3.28570789e+02 -8.31014742e+02 4.92865993e+02
-3.70050739e+02 -2.35488125e+02 -1.63833070e+02 -4.86930191e+02
-9.74429858e+01 -7.48852374e+02 -6.17719584e+01 2.13942179e+02
4.52542022e+02 2.83202323e+02 2.43990105e+02 1.72094231e+02
9.65225890e+02 -2.92801036e+01 -3.13220814e+02 -4.60705452e+01
-2.32787033e+02 1.38783264e+02 -6.23061347e+01 -1.05977672e+02
-8.75333469e+01 1.31424380e+02 -1.99414766e+01 7.97712157e+01
8.30620576e+02 9.19139268e+00 1.44727040e+02 2.65196706e+02]]


So, the values differ (though slightly) for every sample. I assume the model is doing something. I just dont know whats wrong.

Answer

If class 2 is far more likely to occur than any of the other classes and the features are not informative enough to make strong distinctions between classes, then the model will always predict class 2. Instead use svm.decision_function(X_pca) to see the scores for each sample for each class. If these are all the same, then something is wrong. You could also look at svm.coef_ If all the coefficients are 0, then the model isn't doing anything.