chinu chinu - 1 month ago 7
Java Question

Implementation of k-means clustering algorithm

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters.
I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop.
can anyone please guide me where i'm making a mistake..?

for simplicity, i hav taken the input in the program code itself.
here is my code :

import java.io.*;
import java.lang.*;
class Kmean
{
public static void main(String args[])
{
int N=9;
int arr[]={2,4,10,12,3,20,30,11,25}; // initial data
int i,m1,m2,a,b,n=0;
boolean flag=true;
float sum1=0,sum2=0;
a=arr[0];b=arr[1];
m1=a; m2=b;
int cluster1[]=new int[9],cluster2[]=new int[9];
for(i=0;i<9;i++)
System.out.print(arr[i]+ "\t");
System.out.println();

do
{
n++;
int k=0,j=0;
for(i=0;i<9;i++)
{
if(Math.abs(arr[i]-m1)<=Math.abs(arr[i]-m2))
{ cluster1[k]=arr[i];
k++;
}
else
{ cluster2[j]=arr[i];
j++;
}
}
System.out.println();
for(i=0;i<9;i++)
sum1=sum1+cluster1[i];
for(i=0;i<9;i++)
sum2=sum1+cluster2[i];
a=m1;
b=m2;
m1=Math.round(sum1/k);
m2=Math.round(sum2/j);
if(m1==a && m2==b)
flag=false;
else
flag=true;

System.out.println("After iteration "+ n +" , cluster 1 :\n"); //printing the clusters of each iteration
for(i=0;i<9;i++)
System.out.print(cluster1[i]+ "\t");

System.out.println("\n");
System.out.println("After iteration "+ n +" , cluster 2 :\n");
for(i=0;i<9;i++)
System.out.print(cluster2[i]+ "\t");

}while(flag);

System.out.println("Final cluster 1 :\n"); // final clusters
for(i=0;i<9;i++)
System.out.print(cluster1[i]+ "\t");

System.out.println();
System.out.println("Final cluster 2 :\n");
for(i=0;i<9;i++)
System.out.print(cluster2[i]+ "\t");
}
}

Answer

You have a bunch of errors:

  1. At the start of your do loop you should reset sum1 and sum2 to 0.

  2. You should loop until k and j respectively when calculating sum1 and sum2 (or clear cluster1 and cluster2 at the start of your do loop.

  3. In the calculation of sum2 you accidentally use sum1.

When I make those fixes the code runs fine, yielding the output:

Final cluster 1 :   
2   4   10   12  3   11  0   0   0

Final cluster 2 :
20  30  25   0   0   0   0   0   0

My general advise: learn how to use a debugger. Stackoverflow is not meant for questions like this: it is expected that you can find your own bugs and only come here when everything else fails...