chinu - 7 months ago 43
Java Question

# Implementation of k-means clustering algorithm

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters.
I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop.
can anyone please guide me where i'm making a mistake..?

for simplicity, i hav taken the input in the program code itself.
here is my code :

``````import java.io.*;
import java.lang.*;
class Kmean
{
public static void main(String args[])
{
int N=9;
int arr[]={2,4,10,12,3,20,30,11,25};    // initial data
int i,m1,m2,a,b,n=0;
boolean flag=true;
float sum1=0,sum2=0;
a=arr[0];b=arr[1];
m1=a; m2=b;
int cluster1[]=new int[9],cluster2[]=new int[9];
for(i=0;i<9;i++)
System.out.print(arr[i]+ "\t");
System.out.println();

do
{
n++;
int k=0,j=0;
for(i=0;i<9;i++)
{
if(Math.abs(arr[i]-m1)<=Math.abs(arr[i]-m2))
{   cluster1[k]=arr[i];
k++;
}
else
{   cluster2[j]=arr[i];
j++;
}
}
System.out.println();
for(i=0;i<9;i++)
sum1=sum1+cluster1[i];
for(i=0;i<9;i++)
sum2=sum1+cluster2[i];
a=m1;
b=m2;
m1=Math.round(sum1/k);
m2=Math.round(sum2/j);
if(m1==a && m2==b)
flag=false;
else
flag=true;

System.out.println("After iteration "+ n +" , cluster 1 :\n");    //printing the clusters of each iteration
for(i=0;i<9;i++)
System.out.print(cluster1[i]+ "\t");

System.out.println("\n");
System.out.println("After iteration "+ n +" , cluster 2 :\n");
for(i=0;i<9;i++)
System.out.print(cluster2[i]+ "\t");

}while(flag);

System.out.println("Final cluster 1 :\n");            // final clusters
for(i=0;i<9;i++)
System.out.print(cluster1[i]+ "\t");

System.out.println();
System.out.println("Final cluster 2 :\n");
for(i=0;i<9;i++)
System.out.print(cluster2[i]+ "\t");
}
}
``````

You have a bunch of errors:

1. At the start of your `do` loop you should reset `sum1` and `sum2` to 0.

2. You should loop until `k` and `j` respectively when calculating `sum1` and `sum2` (or clear `cluster1` and `cluster2` at the start of your `do` loop.

3. In the calculation of `sum2` you accidentally use `sum1`.

When I make those fixes the code runs fine, yielding the output:

``````Final cluster 1 :
2   4   10   12  3   11  0   0   0

Final cluster 2 :
20  30  25   0   0   0   0   0   0
``````

My general advise: learn how to use a debugger. Stackoverflow is not meant for questions like this: it is expected that you can find your own bugs and only come here when everything else fails...