Jetmax Jetmax - 1 month ago 6
C Question

C - Strtok() , split the string on '\n' but keep the delimiter

I have the following problem with my C program. Part of it's functionality is to read some text and split it into sentences then write those sentences in a file.

I used Strtok() to split the chunk of text in sentences (a sentence ends when \n occurs) however when there is a sentence that just contains the \n character in a chunk of text like :

//////////////////////////////

Hello, this is some sample text

This is the second sentence

The sentence above is just a new line

This is the last sentence.

/////////////////////////////

The output of the file is as follows :

0 Hello, this is some sample text

1 This is the second sentence

2 The sentence above is just a new line

3 This is the last sentence.

////////////////////////////////////////////////////

While it should be :

0 Hello, this is some sample text

1 This is the second sentence

2

3 The sentence above is just \n

4 This is the last sentence.

////////////////////////////////////

The file holding the strings should function as a log file that's why I have to split the chunk of text in sentences split at \n and before writing each sentence into the file have an integer in front.

This is the code related to this functionality :

int counter = 0; // Used for counting
const char s[2] = "\n"; // Used for tokenization

// ............

char *token;
token = strtok(input,s);
while(token != NULL){
fprintf(logs, "%d ", counter);
fprintf(logs, "%s\n" , token); // Add the new line character here since it is removed from the tokenization process
counter++;
token = strtok(NULL, s);
}

// .........


Is there a way to have a special case for when an "empty sentence" (a sentence that is just a \n character) to handle it properly?

Perhaps another function would work instead of strtok()?

Answer

You should probably use strstr or strchr as the comment suggests, but if your application requires strtok for some reason, you could save off the position of the end of each sentence and determine that multiple newlines (\n) occurred sequentially with pointer arithmetic.

rough untested example code:

int counter = 0; // Used for counting
const char* last_sentence;


// ............
      last_sentence = input;
      char *token;
      token = strtok(input,"\n");
      while(token != NULL){
        int i;
        for (i = (token - last_sentence);i > 1; i--){
          // this gets called once for each empty line.
          fprintf(logs, "%d \n", counter++);
        }
        fprintf(logs, "%d %s\n", counter++, token);

        last_sentence = token + strlen(token);
        token = strtok(NULL, "\n");
      }

// .........

EDIT: added example with strchr

Using strchr is just as easy, if not easier especially since you only have one delimiter. The code below takes your sentences, and splits them out. It just prints them, but you could easily extend it for your purposes.

#include <stdio.h>
#include <string.h>
const char* sentences = "Hello, this is some sample text\n"
                        "This is the second sentence\n"
                        "\n"
                        "The sentence above is just a new line\n"
                        "This is the last sentence.\n";

void parse(const char* input){
  char *start, *end;
  unsigned count = 0;

  // the cast to (char*) is because i'm going to change the pointer, not because i'm going to change the value.
  start = end = (char*) input; 

  while( (end = strchr(start, '\n')) ){
      printf("%d %.*s", count++, (int)(end - start + 1), start);
      start = end + 1;
  }
}

int main(void){
  parse(sentences);
}
Comments