In the program, I am scanning a number of brain samples taken in a time series of 40 x 64 x 64 images every 2.5 seconds. The number of 'voxels' (3D pixels) in each image is thus ~ 168,000 ish (40 * 64 * 64), each of which is a 'feature' for an image sample.
I thought of using Principle Component Analysis (PCA) because of the rediculously high
cov = np.dot(X,X.transpose())/float(X.shape)
U,s,v = svd(cov)
S_nn = sum(s)
for num_components in range(0,s.shape):
temp_s = s[0:num_components]
S_ii = sum(temp_s)
if (1 - S_ii/float(S_nn)) <= 0.01:
#Scaling the values
X = scale(X)
n_comp = get_optimal_number_of_components()
print 'optimal number of components = ', n_comp
pca = PCA(n_components = n_comp)
X_new = pca.fit_transform(X)
#Cumulative Variance explains
var1 = np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)
plt.title('Principle Component Analysis for Feature Selection')
plt.ylabel('Percentage of variance')
plt.xlabel('Number of voxels considered')
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.10, random_state=42)
classifier = LogisticRegression()
pipe = Pipeline(steps=[('pca', pca), ('logistic', classifier)])
pipe.set_params(pca__n_components = n_comp).fit(X_train,y_train)
print 'score = ', pipe.score(X_test,y_test)
While I can't immediately find any error, you should try and test how the error behaves if you increase the
number of components. Maybe there is some low variance information missing to give the logistic regression the edge it needs?
99% in PCA are more a guideline than fact.
Other things you could try: Instead of PCA just remove every feature with zero (or very low) variance. DTI data often has features that never change, and are therefore completely unnecessary for classification.
Try finding features that correlate strongly with your result and try only using these to classify.
Always be careful not to overfit!
Correct me if I'm wrong, but PCA will produce a new set of features from the original ones.
I will try to describe it as untechnical as possible
Yes. PCA is basically a fancy axis transformation. Yes you get a new set of features, but they are linear combinations of the previous features in a ordered way such that the first feature describes as much as possible of the data.
The idea is that, if you have a hyperplane, PCA will actually project the hyperplane to the first axes and leave the last ones nearly empty.
PCA is linear dimensionality reduction, so if the true data distribution is not linear, it gives worse results.
Also a friend of mine worked with Brain data similar to yours (lots of features, very little examples) and PCA almost never helped. It might be that the significant pieces of information are not found because too much "noise" is around.