R. Williams R. Williams - 1 year ago 136
C++ Question

Functions from cmath generate run-time infinite loop (No such file or directory)

I am writing a program for class that reads in a set of ordered pairs from a file and implements the k-means algorithm to identify data clusters. This involves the use of the distance formula, which requires the calculation of a square root. As such, I have included the cmath library, and used the sqrt() function, and everything compiles correctly. However, at run-time, the program generates an infinite loop, which I have determined in gdb is caused by the use of the sqrt function, which causes gdb to generate the line "w_sqrt.c: No such file or directory." I have a similar unresolved problem with the rand() function from the cstdlib library, however I am unable to replicate it until the sqrt() problem is resolved.

Note: This is my first time posting a question on StackOverflow so I apologize in advance if I have overlooked any conventions or rule of posting here.

Additional note: Please limit feedback to that directly related to the problem I have described above. For reasons of academic honesty, I am not seeking advice on the implementation of the k-means function, I am simply trying to get the program to run so that, if there are further problems, I can identify and solve them myself.

Thank you in advance.

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cmath>
using namespace std;

const int ARRAY_MAX = 1000;
//DECIMAL_MULT = 10^(required number of decimal places)
const int DECIMAL_MULT = 10000;

struct node{
double xVal;
double yVal;
char symbol;
int clusterIndex;
};

//Returns the distance between the two input nodes
double getDistance(node A, node B) {
double xSquare = ((B.xVal - A.xVal) * (B.xVal - A.xVal));
double ySquare = ((B.yVal - A.yVal) * (B.yVal - A.yVal));
double result = sqrt(xSquare + ySquare);
return result;
}

int main(){
int numNodes = 0;
char temp[ARRAY_MAX];
string filename;
ifstream file;
cout << "Enter the name of the file to be read: ";
getline(cin, filename);
file.open(filename.c_str());
//Check the number of data entries present in the file
while (!(file.eof())) {
numNodes++;
file.getline(temp, ARRAY_MAX);
}
numNodes--;
file.close();
node list[numNodes];
file.open(filename.c_str());
//Build a list of data points
double xMin, xMax, yMin, yMax;
double value;
file >> value;
file.ignore();
xMin = value;
xMax = value;
list[0].xVal = value;
file >> value;
file.ignore();
yMin = value;
yMax = value;
list[0].yVal = value;
for (int i = 1; i < numNodes; i++) {
file >> value;
file.ignore();
list[i].xVal = value;
if (value < xMin) (xMin = value);
if (value > xMax) (xMax = value);
file >> value;
file.ignore();
list[i].yVal = value;
if (value < yMin) (yMin = value);
if (value > yMax) (yMax = value);
}
//Prompt user for number of clusters and symbol to be used for each
int numClusters;
cout << "Please enter the number of clusters to be analyzed: ";
cin >> numClusters;
cin.ignore();
node centerList[numClusters];
char usedSymbols[numClusters];
char entry;
bool validEntry;
for (int i = 0; i < numClusters; i++) {
centerList[i].clusterIndex = i;
do {
validEntry = true;
cout << "Please enter the character representing "
<< "cluster " << i + 1 << ": ";
cin >> entry;
cin.ignore();
for (int j = 0; j < i; j++) {
if (centerList[j].symbol == entry){
cout << "Character has already been "
<< "used.\n";
validEntry = false;
break;
}
}
} while (!validEntry);
centerList[i].symbol = entry;
}
//Assign random starting points to cluster centers
srand(time(NULL));
int xMaxCast = xMax * DECIMAL_MULT;
int xMinCast = xMin * DECIMAL_MULT;
int yMaxCast = yMax * DECIMAL_MULT;
int yMinCast = yMin * DECIMAL_MULT;
int xRange = xMaxCast - xMinCast;
int yRange = yMaxCast - yMinCast;
int randValue;
for (int i = 0; i < numClusters; i++) {
randValue = std::rand() % xRange + xMinCast;
centerList[i].xVal = randValue / DECIMAL_MULT;
randValue = std::rand() % yRange + yMinCast;
centerList[i].yVal = randValue / DECIMAL_MULT;
}
//Determine the cluster of each node
for (int i = 0; i < numNodes; i++) {
list[i].clusterIndex = centerList[0].clusterIndex;
for (int j = 1; j < numClusters; j++) {
if (getDistance(list[i], centerList[list[i].clusterIndex])
> getDistance(list[i], centerList[j])) {
list[i].clusterIndex = j;
}
}
list[i].symbol = centerList[list[i].clusterIndex].symbol;
}
bool proceed = true;
double average;
int clusterCount;
while (proceed) {
proceed = false;
//Move each cluster center to the centroid of its currently
// assigned points; if all centers are already in the
// correct positions, discontinue this operation
for (int i = 0; i < numClusters; i++) {
average = 0;
clusterCount = 0;
for (int j = 0; j < numNodes; j++) {
if (list[j].clusterIndex == i) {
average += list[j].xVal;
clusterCount++;
}
}
average /= clusterCount;
if (centerList[i].xVal != average) {
proceed = true;
centerList[i].xVal = average;
}
average = 0;
clusterCount = 0;
for (int j = 0; j < numNodes; j++) {
if (list[j].clusterIndex == i) {
average += list[j].yVal;
clusterCount++;
}
}
average /= clusterCount;
if (centerList[i].yVal != average) {
proceed = true;
centerList[i].yVal = average;
}

}
if (proceed) {
//Update cluster assignment of each node
for (int i = 0; i < numNodes; i++) {
for (int j = 0; j < numClusters; j++) {
if (getDistance(list[i],
centerList[list[i].clusterIndex])
> getDistance(list[i],
centerList[j])) {
list[i].clusterIndex = j;
}
}
list[i].symbol =
centerList[list[i].clusterIndex].symbol;
}
}
}
}

Answer Source

Note that in an expression like centerList[i].xVal != average (i.e., an int is not equal to a double), both values will be compared as doubles. This ends up being a terrible way of comparing floating point numbers, and this is widely discussed.

But here's where the real problem happens. centerList[i].xVal = average; does not assign centerList[i].xVal to average but (int) average (the nearest integer to average towards zero). This leads to a condition that is infinitely repeated. If node::xVal and node::yVal were doubles or the not-equals comparison was fixed by comparing integers or the not-equals comparison was fixed by appropriately comparing floating point numbers, then the program will likely not loop indefinitely (it did not loop indefinitely for the input that I tried).

Note that using the flag -Wconversion, warnings for centerList[i].xVal = average; and centerList[i].yVal = average; appear. Additionally, -Wconversion is not covered by either -Wall or -Wextra. I have found that -Wconversion is a worthwhile flag to use when developing code for numerical algorithms even though it may produce many seemingly benign warnings. Everything is fine and dandy until a devastating implicit conversion from a float to an int causes a massive relative error.

I would hypothesize that gdb indicated sqrt when you inserted a break point because sqrt is relatively expensive and called frequently. gdb simply couldn't find the source for sqrt (maybe because the source isn't installed on your system).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download