RaviTej310 RaviTej310 - 3 months ago 25
Python Question

Jupyter notebook crashing for scikit TSNE dimensionality reduction

I am using Jupyter notebook and python 2.7 from anaconda. I have an approximately 250,000 dimensional data set which I need to compress to n lower dimensions. I am using scikit TSNE. When running the TSNE for

n=5
or
n=10
, it works fine. But when I go to
n=50
or more, the following message is shown:
"The kernel appears to have died."
There is no error message displayed. What is the problem? Is it due to a memory overload? Should I run the code in the terminal as a script rather than in Jupyter?

My TSNE function:

def tsne_to_n_dimensions(n):
start=timer()
#tsne
print diff_df.shape
tsne = sklearn.manifold.TSNE(n_components=n,verbose=2)
data_nd_tsne =tsne.fit_transform(diff_df)

calculate stuff from data_nd_tsne
return stuff


And diff_df is a global panda data frame

I have gone through this
and this but couldn't find a solution

Answer Source

I have found a solution using python-bhtsne which is also an implementation of Barnes-Hut's t-Distributed Stochastic Neighbor Embedding approach.

It is very easy to implement and even provides an option to get the same output in every run of tsne with the same parameters - something that is absent in the scikit implementation.

It is a python wrapper for the original implementation by Laurens van der Maaten.

So basically you'll just need to do the following instead of the regular TSNE from scikit :

from bhtsne import tsne
data_nd_tsne = tsne(diff_df)