AllTheTime AllTheTime - 3 months ago 10
C Question

C - Safely parse and send many strings in large endless stream

I don't have much experience with C.

I have a small C program that connects to a practically infinite text stream (25Mb/s).

I want to send each line of the string as a separate message with zeromq.

So, I will be sending many thousands of messages per second, and before each message is sent I want to manipulate the string being sent over the socket:

Say I start with:

Quote {0.0.0.0} XXX <1>A<2>B<3>C


I want

XXX Quote <1>A<2>B<3>C


In a general sense, how can I do this safely so that I don't run into memory leaks? I would have something like this (Just an example, the
main
function would actually be a never ending loop with different chars):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* parse(const char* input) {

char* output;
char* input_copy = strdup(input);
char* token;
char* first;
char* third;
char* fourth;

token = strtok(input_copy, " ");
first = token;

for (int i = 0; i < 3; i++)
{
token = strtok(NULL, " ");
if (i == 1) third = token;
if (i == 2) fourth = token;
}

asprintf(&output, "%s %s %s", third, first, fourth);
return output;
free(output);
}

int main(void)
{
const char *a = "Quote {0.0.0.0} XXX <1>A<2>B<3>C";
//SEND_MESSAGE(parse(a));
return 0;
}


Would this work?

Answer

If you know (or can determine with particularity) what the maximum size of each first, second, third and fourth will be, you can eliminate all possibility of a memory leak by simply using a fixed size buffer for each. You say your 25M/sec of text is broken up into lines, so presumably you are using a line-oriented-input function (e.g. fgets or getline) to read from the stream. In that case, you could also just use the maximum line length (X4) to insure your fixed buffers are adequate.

You are tokenizing into a first, second, third and fourth using a space as the delimiter, so why not use sscanf? If you want to use your parse function, just pass the buffers as parameters.

If you can determine a max and you are tokenizing on space, you could do something as simple as:

#include <stdio.h>

#define MAXC 1024

int main(void)
{
    const char *a = "Quote {0.0.0.0} XXX <1>A<2>B<3>C";
    char first[MAXC] = "",
         second[MAXC] = "",
         third[MAXC] = "",
         fourth[MAXC] = "";

    /* read a line from the stream and simply call sscanf */
    if (sscanf (a, " %s %s %s %s", first, second, third, fourth) == 4)
        printf ("%s %s %s\n", third, first, fourth);

    return 0;
}

(printf is just used for example, pass the results to your zeromq as required)

Example Use/Output

$ ./bin/staticbuf
XXX Quote <1>A<2>B<3>C

(that would have the side-effect of greatly simplifying your code and likely speed it up quite a bit as well).

If you can't, with any confidence, determine a max size, then you are stuck with the overhead of malloc/free (or using POSIX getline and letting it handle the allocation).