Dvole Dvole - 4 months ago 6
C Question

Reading file in C

I'm reading a file in my C program and comparing every word in it with my word, which is entered via command line argument. But I get crashes, and I can't understand what's wrong. How do I track such errors? What is wrong in my case?

My compiler is clang. The code compiles fine. When running it says 'segmentation fault'.

Here is the code.

#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[])
{
char* temp = argv[1];
char* word = strcat(temp, "\n");

char* c = "abc";
FILE *input = fopen("/usr/share/dict/words", "r");

while (strcmp(word, c))
{
char* duh = fgets(c, 20, input);
printf("%s", duh);
}

if (!strcmp (word, c))
{
printf("FOUND IT!\n");
printf("%s\n%s", word, c);
}

fclose(input);
}

Answer

The issue here is that you are trying to treat strings in C as you might in another language (like C++ or Java), in which they are resizable vectors that you can easily append or read an arbitrary amount of data into.

C strings are much lower level. They are simply an array of characters (or a pointer to such an array; arrays can be treated like pointers to their first element in C anyhow), and the string is treated as all of the characters within that array up to the first null character. These arrays are fixed size; if you want a string of an arbitrary size, you need to allocate it yourself using malloc(), or allocate it on the stack with the size that you would like.

One thing here that is a little confusing is you are using a non-standard type string. Given the context, I'm assuming that's coming from your cs50.h, and is just a typedef to char *. It will probably reduce confusion if you actually use char * instead of string; using a typedef obscures what's really going on.

Let's start with the first problem.

    string word = strcat(argv[1], "\n");

strcat() appends the second string onto the first; it starts from the null terminator of the first string, and replaces that with the first character of the second string, and so on, until it reaches a null in the second string. In order for this to work, the buffer containing the first string needs to have enough room to fit the second one. If it does not, you may overwrite arbitrary other memory, which could cause your program to crash or have all kinds of other unexpected behavior.

Here's an illustration. Let's say that argv[1] contains the word hello, and the buffer has exactly as much space as it needs for this. After it is some other data; I've filled in other for the sake of example, though it won't actually be that, it could be anything, and it may or may not be important:

+---+---+---+---+---+---+---+---+---+---+---+---+
| h | e | l | l | o | \0| o | t | h | e | r | \0|
+---+---+---+---+---+---+---+---+---+---+---+---+

Now if you use strcat() to append "\n", you will get:

+---+---+---+---+---+---+---+---+---+---+---+---+
| h | e | l | l | o | \n| \0| t | h | e | r | \0|
+---+---+---+---+---+---+---+---+---+---+---+---+

You can see that we've overwritten the other data that was after hello. This may cause all kinds of problems. To fix this, you need to copy your argv[1] into a new string, that has enough room for it plus one more character (and don't forget the trailing null). You can call strlen() to get the length of the string, then add 1 for the \n, and one for the trailing null, to get the length that you need.

Actually, instead of trying to add a \n to the word you get in from the command line, I would recommend stripping off the \n from your input words, or using strncmp() to compare all but the last character (the \n). In general, it's best in C to avoid appending strings, as appending strings means you need to allocate memory and copy things around, and it can be easy to make mistakes doing so, as well as being inefficient. Higher level languages usually take care of the details for you, making it easier to append strings, though still just as inefficient.

After your edit, you changed this to:

    char* temp = argv[1];
    char* word = strcat(temp, "\n");

However, this has the same problem. A char * is a pointer to a character array. Your temp variable is just copying the pointer, not the actual value; it is still pointing to the same buffer. Here's an illustration; I'm making up addresses for the purposes of demonstration, in the real machine there will be more objects in between these things, but this should suffice for the purpose of demonstration.

+------------+---------+-------+
|    name    | address | value |
+------------+---------+-------+
| argv       |    1000 |  1004 |-------+
| argv[0]    |    1004 |  1008 | --+ <-+
| argv[1]    |    1006 |  1016 | --|---+
| argv[0][0] |    1008 |   'm' | <-+   |
| argv[0][1] |    1009 |   'y' |       |
| argv[0][2] |    1010 |   'p' |       |
| argv[0][3] |    1011 |   'r' |       |
| argv[0][4] |    1012 |   'o' |       |
| argv[0][5] |    1013 |   'g' |       |
| argv[0][6] |    1014 |     0 |       |
| argv[1][0] |    1016 |   'w' | <-+ <-+
| argv[1][1] |    1017 |   'o' |   |
| argv[1][2] |    1018 |   'r' |   |
| argv[1][3] |    1019 |   'd' |   |
| argv[1][4] |    1020 |     0 |   |
+------------+---------+-------+   |

Now when you create your temp variable, all you are doing is copying argv[1] into a new char *:

+------------+---------+-------+   | 
|    name    | address | value |   |
+------------+---------+-------+   |
| temp       |    1024 |  1016 | --+
+------------+---------+-------+

As a side note, you also shouldn't ever try to access argv[1] without checking that argc is greater than 1. If someone doesn't pass any arguments in, then argv[1] itself is invalid to access.

I'll move on to the next problem.

    string c = "abc";

    // ...

        char* duh = fgets(c, 20, input);

Here, you are referring to the static string "abc". A string that appears literally in the source, like "abc", goes into a special, read-only part of the memory of the program. Remember what I said; string here is just a way of saying char *. So c is actually just a pointer into this read-only section of memory; and it has only enough room to store the characters that you provided in the text (4, for abc and the null character terminating the string). fgets() takes as its first argument a place to store the string that it is reading, and its second the amount of space that it has. So you are trying to read up to 20 bytes, into a read-only buffer that only has room for 4.

You need to either allocate space for reading on the stack, using, for example:

char c[20];

Or dynamically, using malloc():

char *c = malloc(20);