RoadRunner RoadRunner - 3 months ago 17
C Question

Grouping array of Strings C

I have made an array of strings and I am trying to group a string array into categories.

So far my code looks like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
main(int argc, char *argv[]) {
char *results[] = {"Canada", "Cycling", "Canada", "Swimming", "India", "Swimming", "New Mexico",
"Cycling", "New Mexico", "Cycling", "New Mecico", "Swimming"};



int nelements, i, country_count;

nelements = sizeof(results) / sizeof(results[0]);

for (i = 0 ; i < nelements; i++) {
printf("%s\n", results[i]);
}

return 0;
}


Which prints out this:

Canada
Cycling
Canada
Swimming
India
Swimming
New Mexico
Cycling
New Mexico
Cycling
New Mexico
Swimming


But I am trying to group the sports along with respective counts with the individual countries, which I want to look like this:

Canada
Cycling 1
Swimming 1

India
Swimming 1

New Mexico
Cycling 2
Swimming 1


I am thinking of categorizing the countries with every
i+2
element in the array, and using
strcmp
to remove the duplicate country strings, but I am not sure how to do this with the counts of the sports along with each country.

I am just not sure how to go about this. Any sort of help would be appreciated.

Answer

I would use a struct (if you are not familiar, I always remind myself when needed with myStruct.c) and with two arrays as data members, like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define COUNTRY_LENGTH 15
#define MAX_SPORTS 5

enum sport_name { CYCLING, SWIMMING };

typedef struct Record {
  char country[COUNTRY_LENGTH];
  int sports[MAX_SPORTS];
} Record;

// return index of 'country' in 'array' if the 'country'
// is found inside 'array', else -1
int exists(char country[], Record* array, int size) {
    int i;
    for(i = 0; i < size; ++i)
        if(!strcmp(array[i].country, country))
            return i;
    return -1;
}

int find_sport_index(char sport[]) {
    if(!strcmp(sport, "Cycling"))
        return CYCLING;
    if(!strcmp(sport, "Swimming"))
        return SWIMMING;
    printf("I couldn't find a sport index for %s\n!!! Do something...Undefined Behavior!", sport);
    return -1;
}

char* find_sport_string(int sport) {
    if(sport == CYCLING)
        return "Cycling";
    if(sport == SWIMMING)
        return "Swimming";
    printf("I couldn't find a sport string for sport index %d\n!!! Do something...", sport);
    return NULL;
}

int main(int argc, char *argv[]) {
    // you had a typo, New Mecico, I corrected it..Also you could have used a struct here... ;)
    char *results[] = {"Canada", "Cycling", "Canada", "Swimming", "India", "Swimming", "New Mexico",
                       "Cycling", "New Mexico", "Cycling", "New Mexico", "Swimming"};



    int nelements, i, j;

    nelements = sizeof(results) / sizeof(results[0]);

    const int records_size = nelements/2;

    Record record[records_size];
    for(i = 0; i < records_size; i++) {
        for(j = 0; j < COUNTRY_LENGTH; j++) 
            record[i].country[j] = 0;
        for(j = 0; j < MAX_SPORTS; j++)
            record[i].sports[j] = 0;
    }

    int country_index, records_count = 0;
    for(i = 0; i < nelements; ++i) {
        // results[i] is a country
        if(i % 2 == 0) {
            country_index = exists(results[i], record, records_size);
            if(country_index == -1) {
                country_index = records_count++;
                strcpy(record[country_index].country, results[i]);
            }
        } else {
            // result[i] is a sport
            record[country_index].sports[find_sport_index(results[i])]++;
        }
    }    


    for(i = 0; i < records_size; ++i) {
        if(strlen(record[i].country)) {
            printf("%s\n", record[i].country);
            for(j = 0; j < MAX_SPORTS; j++) {
                if(record[i].sports[j] != 0) {
                    printf("    %s %d\n", find_sport_string(j), record[i].sports[j]);
                }
            }
        }    
    }

    return 0;
}

Output:

C02QT2UBFVH6-lm:~ gsamaras$ ./a.out 
Canada
    Cycling 1
    Swimming 1
India
    Swimming 1
New Mexico
    Cycling 2
    Swimming 1

The idea is that:

  1. The struct Record holds the records in the Olympics, with relevant sports.
  2. Record.country holds the name of the country (and I assume that it be 14 characters at max, +1 for the NULL terminator, thus I defined it as 15).
  3. Record.sports is an array with size MAX_SPORTS- the size would be equal to all the sports in the Olympics, but I assumed it's 5. Every position of this array is a counter (of the medals every country got in a sport. For example, Record.sports[1] = 2 would indicate that this country has 2 medals in Swimming. But how I know it was Swimming? I decided apriori, as a programmer that the first counter is connected to Cycling, the second to Swimming and so on. I used an enum to make that more readable, instead of using magic numbers. (Note: You could use a list instead an array, but that would be an overkill for that application. But if you want to do it for fun (and because a bit less memory), you can use our List (C)).
  4. You define results[] in a strange way, since you should really have used a struct for that, but I worked with your code...So I needed an array of Records, and its size should be equal to the number of the countries, i.e. the half of the size of results[]. Notice that because you defined results[] to contain implicit pairs of country-sport, a division by two is just enough to determine the size of the Records array.
  5. I loop over results[] to populate record[], by using a counter named i in the . When i is even, result[i] contains a country, else it contains a sport. I use the module operator (%) to determine that easily.
  6. If the country doesn't exist in record[], then I insert it, else I don't insert it again. In both cases I want to remember its index in record[], so that in the next iteration, that we will process the sport, we will now at which position of record[] we should look into and act accordingly.
  7. Now, when I process a sport, I want to increase the counter of that sport, but only for the corresponding country (remember that I have stored the country index I had processed in the previous iteration).
  8. Then I just print, that's it! :)