not_a_robot - 1 year ago 238

Python Question

I am trying to find the right number of clusters,

`k`

`sklearn.cluster.MiniBatchKMeans`

`from sklearn.cluster import MiniBatchKMeans`

from sklearn.feature_extraction.text import HashingVectorizer

docs = ['hello monkey goodbye thank you', 'goodbye thank you hello', 'i am going home goodbye thanks', 'thank you very much sir', 'good golly i am going home finally']

vectorizer = HashingVectorizer()

X = vectorizer.fit_transform(docs)

for k in range(5):

model = MiniBatchKMeans(n_clusters = k)

model.fit(X)

And I receive this error:

`Warning (from warnings module):`

File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1279

0, n_samples - 1, init_size)

DeprecationWarning: This function is deprecated. Please call randint(0, 4 + 1) instead

Traceback (most recent call last):

File "<pyshell#85>", line 3, in <module>

model.fit(X)

File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1300, in fit

init_size=init_size)

File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 640, in _init_centroids

x_squared_norms=x_squared_norms)

File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 88, in _k_init

n_local_trials = 2 + int(np.log(n_clusters))

OverflowError: cannot convert float infinity to integer

I know the

`type(k)`

`int`

`type(2)`

`k = 2; type(k)`

`model = MiniBatchKMeans(n_clusters = 2)`

model.fit(X)

Even running a different

`model`

`>>> model = KMeans(n_clusters = 2)`

>>> model.fit(X)

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,

n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,

verbose=0)

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Let's analyze your code:

`for k in range(5)`

returns the following sequence:`0, 1, 2, 3, 4`

`model = MiniBatchKMeans(n_clusters = k)`

inits model with`n_clusters=k`

- Let's look at the first iteration:
`n_clusters=0`

is used- Within the optimization-code (look at the output):
`int(np.log(n_clusters))`

- =
`int(np.log(0))`

- =
`int(-inf)`

- ERROR: no infinity definition for integers!
- -> casting floating-point value of -inf to int not possible!

Setting `n_clusters=0`

does not make sense!

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**