Joshua Katz Joshua Katz - 1 month ago 8
C Question

Segfault resulting from strdup and strtok

I've been assigned a homework from my college professor and I seem to have found some strange behavior of

strtok


Basically, we have to parse a CSV file for my class, where the number of tokens in the CSV is known and the last element may have extra
","
characters.

An example of a line:

Hello,World,This,Is,A lot, of Text


Where the tokens should be output as

1. Hello
2. World
3. This
4. Is
5. A lot, of Text


For this assignment we MUST use
strtok
. Because of this I found on some other SOF post that using
strtok
with an empty string (or passing
"\n"
as the second argument) results in reading until the end of the line. This is perfect for my application since the extra commas always appear in the last element.

I've created this code which works:

#include <string.h>
#include <stdlib.h>
#include <stdio.h>

#define NUM_TOKENS 5

const char *line = "Hello,World,This,Is,Text";

char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);

// Make an array the correct size to hold num_tokens
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);

int i = 0;
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS; token = strtok(NULL, i < NUM_TOKENS - 1 ? ",\n" : "\n"))
{
tokens[i++] = strdup(token);
}

free(copy);

return tokens;
}

int main()
{
char **tokens = split_line(line, NUM_TOKENS);

for (int i = 0; i < NUM_TOKENS; i++)
{
printf("%s\n", tokens[i]);
free(tokens[i]);
}
}


Now this works and should get me full credit but I hate this ternary that shouldn't be needed:

token = strtok(NULL, i < NUM_TOKENS - 1 ? ",\n" : "\n");


I'd like to replace the method with this version:

char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);

// Make an array the correct size to hold num_tokens
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);

int i = 0;
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}

tokens[i] = strdup(strtok(NULL, "\n"));

free(copy);

return tokens;
}


This tickles my fancy much nicer since it is much easier to see that there is a final case. You also get rid of the strange ternary operator.

Sadly though, this segfaults! I can't for the life of me figure out why.

Edit: Add some output examples:

[11:56:06] gravypod:test git:(master*) $ ./test_no_fault
Hello
World
This
Is
Text
[11:56:10] gravypod:test git:(master*) $ ./test_seg_fault
[1] 3718 segmentation fault (core dumped) ./test_seg_fault
[11:56:14] gravypod:test git:(master*) $

Answer

Please check the return value from strtok before you risk passing NULL to another function. Your loop is calling strtok one more time than you think.

It is more usual to use this return value to control your loop, then you are not at the mercy of your data. As for the delimitors, best to keep it simple and not try anything fancy.

char **split_line(const char *line, int num_tokens)
{
    char *copy = strdup(line);
    char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
    int i = 0;
    char *token;
    char delim1[] = ",\r\n";
    char delim2[] = "\r\n";
    char *delim = delim1;                   // start with a comma in the delimiter set

    token = strtok(copy, delim);
    while(token != NULL) {                  // strtok result comtrols the loop
        tokens[i++] = strdup(token);
        if(i == NUM_TOKENS) {
            delim = delim2;                 // change the delimiters
        }
        token = strtok(NULL, delim);    
    }
    free(copy);
    return tokens;
}

Note you should also check the return values from malloc and strdup and free your memory properly

Comments