not_a_robot not_a_robot - 1 year ago 238
Python Question

MiniBatchKMeans OverflowError: cannot convert float infinity to integer?

I am trying to find the right number of clusters,

, according to silhouette scores using

from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import HashingVectorizer

docs = ['hello monkey goodbye thank you', 'goodbye thank you hello', 'i am going home goodbye thanks', 'thank you very much sir', 'good golly i am going home finally']

vectorizer = HashingVectorizer()

X = vectorizer.fit_transform(docs)

for k in range(5):
model = MiniBatchKMeans(n_clusters = k)

And I receive this error:

Warning (from warnings module):
File "C:\Python34\lib\site-packages\sklearn\cluster\", line 1279
0, n_samples - 1, init_size)
DeprecationWarning: This function is deprecated. Please call randint(0, 4 + 1) instead
Traceback (most recent call last):
File "<pyshell#85>", line 3, in <module>
File "C:\Python34\lib\site-packages\sklearn\cluster\", line 1300, in fit
File "C:\Python34\lib\site-packages\sklearn\cluster\", line 640, in _init_centroids
File "C:\Python34\lib\site-packages\sklearn\cluster\", line 88, in _k_init
n_local_trials = 2 + int(np.log(n_clusters))
OverflowError: cannot convert float infinity to integer

I know the
, so I don't know where this issue is coming from. I can run the following just fine, but I can't seem to iterate through integers in a list, even though the
is equal to
k = 2; type(k)

model = MiniBatchKMeans(n_clusters = 2)

Even running a different

>>> model = KMeans(n_clusters = 2)
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,

Answer Source

Let's analyze your code:

  • for k in range(5) returns the following sequence:
    • 0, 1, 2, 3, 4
  • model = MiniBatchKMeans(n_clusters = k) inits model with n_clusters=k
  • Let's look at the first iteration:
    • n_clusters=0 is used
    • Within the optimization-code (look at the output):
    • int(np.log(n_clusters))
    • = int(np.log(0))
    • = int(-inf)
    • ERROR: no infinity definition for integers!
    • -> casting floating-point value of -inf to int not possible!

Setting n_clusters=0 does not make sense!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download