R. Williams R. Williams - 1 month ago 7
C++ Question

Functions from cmath generate run-time infinite loop (No such file or directory)

I am writing a program for class that reads in a set of ordered pairs from a file and implements the k-means algorithm to identify data clusters. This involves the use of the distance formula, which requires the calculation of a square root. As such, I have included the cmath library, and used the sqrt() function, and everything compiles correctly. However, at run-time, the program generates an infinite loop, which I have determined in gdb is caused by the use of the sqrt function, which causes gdb to generate the line "w_sqrt.c: No such file or directory." I have a similar unresolved problem with the rand() function from the cstdlib library, however I am unable to replicate it until the sqrt() problem is resolved.

Note: This is my first time posting a question on StackOverflow so I apologize in advance if I have overlooked any conventions or rule of posting here.

Additional note: Please limit feedback to that directly related to the problem I have described above. For reasons of academic honesty, I am not seeking advice on the implementation of the k-means function, I am simply trying to get the program to run so that, if there are further problems, I can identify and solve them myself.

Thank you in advance.

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cmath>
using namespace std;

const int ARRAY_MAX = 1000;
//DECIMAL_MULT = 10^(required number of decimal places)
const int DECIMAL_MULT = 10000;

struct node{
double xVal;
double yVal;
char symbol;
int clusterIndex;
};

//Returns the distance between the two input nodes
double getDistance(node A, node B) {
double xSquare = ((B.xVal - A.xVal) * (B.xVal - A.xVal));
double ySquare = ((B.yVal - A.yVal) * (B.yVal - A.yVal));
double result = sqrt(xSquare + ySquare);
return result;
}

int main(){
int numNodes = 0;
char temp[ARRAY_MAX];
string filename;
ifstream file;
cout << "Enter the name of the file to be read: ";
getline(cin, filename);
file.open(filename.c_str());
//Check the number of data entries present in the file
while (!(file.eof())) {
numNodes++;
file.getline(temp, ARRAY_MAX);
}
numNodes--;
file.close();
node list[numNodes];
file.open(filename.c_str());
//Build a list of data points
double xMin, xMax, yMin, yMax;
double value;
file >> value;
file.ignore();
xMin = value;
xMax = value;
list[0].xVal = value;
file >> value;
file.ignore();
yMin = value;
yMax = value;
list[0].yVal = value;
for (int i = 1; i < numNodes; i++) {
file >> value;
file.ignore();
list[i].xVal = value;
if (value < xMin) (xMin = value);
if (value > xMax) (xMax = value);
file >> value;
file.ignore();
list[i].yVal = value;
if (value < yMin) (yMin = value);
if (value > yMax) (yMax = value);
}
//Prompt user for number of clusters and symbol to be used for each
int numClusters;
cout << "Please enter the number of clusters to be analyzed: ";
cin >> numClusters;
cin.ignore();
node centerList[numClusters];
char usedSymbols[numClusters];
char entry;
bool validEntry;
for (int i = 0; i < numClusters; i++) {
centerList[i].clusterIndex = i;
do {
validEntry = true;
cout << "Please enter the character representing "
<< "cluster " << i + 1 << ": ";
cin >> entry;
cin.ignore();
for (int j = 0; j < i; j++) {
if (centerList[j].symbol == entry){
cout << "Character has already been "
<< "used.\n";
validEntry = false;
break;
}
}
} while (!validEntry);
centerList[i].symbol = entry;
}
//Assign random starting points to cluster centers
srand(time(NULL));
int xMaxCast = xMax * DECIMAL_MULT;
int xMinCast = xMin * DECIMAL_MULT;
int yMaxCast = yMax * DECIMAL_MULT;
int yMinCast = yMin * DECIMAL_MULT;
int xRange = xMaxCast - xMinCast;
int yRange = yMaxCast - yMinCast;
int randValue;
for (int i = 0; i < numClusters; i++) {
randValue = std::rand() % xRange + xMinCast;
centerList[i].xVal = randValue / DECIMAL_MULT;
randValue = std::rand() % yRange + yMinCast;
centerList[i].yVal = randValue / DECIMAL_MULT;
}
//Determine the cluster of each node
for (int i = 0; i < numNodes; i++) {
list[i].clusterIndex = centerList[0].clusterIndex;
for (int j = 1; j < numClusters; j++) {
if (getDistance(list[i], centerList[list[i].clusterIndex])
> getDistance(list[i], centerList[j])) {
list[i].clusterIndex = j;
}
}
list[i].symbol = centerList[list[i].clusterIndex].symbol;
}
bool proceed = true;
double average;
int clusterCount;
while (proceed) {
proceed = false;
//Move each cluster center to the centroid of its currently
// assigned points; if all centers are already in the
// correct positions, discontinue this operation
for (int i = 0; i < numClusters; i++) {
average = 0;
clusterCount = 0;
for (int j = 0; j < numNodes; j++) {
if (list[j].clusterIndex == i) {
average += list[j].xVal;
clusterCount++;
}
}
average /= clusterCount;
if (centerList[i].xVal != average) {
proceed = true;
centerList[i].xVal = average;
}
average = 0;
clusterCount = 0;
for (int j = 0; j < numNodes; j++) {
if (list[j].clusterIndex == i) {
average += list[j].yVal;
clusterCount++;
}
}
average /= clusterCount;
if (centerList[i].yVal != average) {
proceed = true;
centerList[i].yVal = average;
}

}
if (proceed) {
//Update cluster assignment of each node
for (int i = 0; i < numNodes; i++) {
for (int j = 0; j < numClusters; j++) {
if (getDistance(list[i],
centerList[list[i].clusterIndex])
> getDistance(list[i],
centerList[j])) {
list[i].clusterIndex = j;
}
}
list[i].symbol =
centerList[list[i].clusterIndex].symbol;
}
}
}
}

Answer

Note that in an expression like centerList[i].xVal != average (i.e., an int is not equal to a double), both values will be compared as doubles. This ends up being a terrible way of comparing floating point numbers, and this is widely discussed.

But here's where the real problem happens. centerList[i].xVal = average; does not assign centerList[i].xVal to average but (int) average (the nearest integer to average towards zero). This leads to a condition that is infinitely repeated. If node::xVal and node::yVal were doubles or the not-equals comparison was fixed by comparing integers or the not-equals comparison was fixed by appropriately comparing floating point numbers, then the program will likely not loop indefinitely (it did not loop indefinitely for the input that I tried).

Note that using the flag -Wconversion, warnings for centerList[i].xVal = average; and centerList[i].yVal = average; appear. Additionally, -Wconversion is not covered by either -Wall or -Wextra. I have found that -Wconversion is a worthwhile flag to use when developing code for numerical algorithms even though it may produce many seemingly benign warnings. Everything is fine and dandy until a devastating implicit conversion from a float to an int causes a massive relative error.

I would hypothesize that gdb indicated sqrt when you inserted a break point because sqrt is relatively expensive and called frequently. gdb simply couldn't find the source for sqrt (maybe because the source isn't installed on your system).