Curnelious Curnelious - 4 months ago 9
C Question

Split string by one of few delimiters?

I have this

A > B
, or
A < B
, or
A==B
,

using the
strtok
I will destroy the data , and my goal is to get some kind of a structure where I can examine :


  1. what kind of delimiter I had

  2. get access to both sides of it (A and B).



so:

if ( > )
do something with A and B
else if (==)
do something with A and B


I know it sound simple , but it always comes to be cumbersome .

EDIT:

What i did was this, seems like too long for the task :

for (int k=1;k<strlen(p);k++)
{

char left[4]="" ;
char right[12]="" ;



switch(p[k])
{


case '>' :
{
long num =strstr(p,">") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);


break;
}



case '<' :
{
long num =strstr(p,"<") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);


break;
}


case '=' :

{
long num =strstr(p,"=") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);


break;
}

case '!' :
{
long num =strstr(p,"!") - p ;
strncpy(left,p,num);
strncpy(right,p+num+1,strlen(p)-num-1);


break;
}

default :
{}
}


}

Answer

Here is a generalized procedure:

  1. For a given set delimiters, use strstr to check each if it appears in the input string. As a bonus, my code below allows 'double' entries such as < and <>; it checks all and use the longest possible.
  2. After determining the best delimiter to use, you have a pointer to its start. Then you can
  3. .. copy everything at its left into a left variable;
  4. .. copy the delimiter itself into a delim variable (for consistency); and
  5. .. copy everything to the right of the delimiter into a right variable.

Point 4 is 'for consistency' with the other two variables. You could also create an enumeration (LESS, EQUALS, MORE, NOT_EQUAL (in my example)) and return that instead, because the set of possibilities is limited to these.

In code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

const char *delimiters[] = {
    "<", ">", "==", "<>", NULL
};

int split_string (const char *input, char **dest_left, char **dest_delim, char **dest_right)
{
    int iterator;
    int best_fit_delim;
    char *ptr;

    /* (optionally) clean whitespace at start */
    while (isspace(*input))
        input++;

    /* look for the longest delimiter we can find */
    best_fit_delim = -1;
    iterator = 0;
    while (delimiters[iterator])
    {
        ptr = strstr (input, delimiters[iterator]);
        if (ptr)
        {
            if (best_fit_delim == -1 || strlen(delimiters[iterator]) > strlen(delimiters[best_fit_delim]))
                best_fit_delim = iterator;
        }
        iterator++;
    }

    /* did we find anything? */
    if (best_fit_delim == -1)
        return 0;

    /* reset ptr to this found one */
    ptr = strstr (input, delimiters[best_fit_delim]);

    /* copy left hand side */
    iterator = ptr - input;
    /* clean whitespace at end */
    while (iterator > 0 && isspace(input[iterator-1]))
        iterator--;
    *dest_left = malloc (iterator + 1);
    memcpy (*dest_left, input, iterator);
    (*dest_left)[iterator] = 0;

    /* the delimiter itself */
    *dest_delim = malloc(strlen(delimiters[best_fit_delim])+1);
    strcpy (*dest_delim, delimiters[best_fit_delim]);

    /* update the pointer to point to *end* of delimiter */
    ptr += strlen(delimiters[best_fit_delim]);
    /* skip whitespace at start */
    while (isspace(*ptr))
        ptr++;

    /* copy right hand side */
    *dest_right = malloc (strlen(ptr) + 1);
    strcpy (*dest_right, ptr);

    return 1;
}

int main (void)
{
    char *source_str = "A <> B";
    char *left, *delim, *right;

    if (!split_string (source_str, &left, &delim, &right))
    {
        printf ("invalid input\n");
    } else
    {
        printf ("left: \"%s\"\n", left);
        printf ("delim: \"%s\"\n", delim);
        printf ("right: \"%s\"\n", right);

        free (left);
        free (delim);
        free (right);
    }
    return 0;
}

resulting, for A <> B, in

left: "A"
delim: "<>"
right: "B"

The code can be a bit smaller if you only need to check your list of <, ==, and >; then you can use strchr, for single characters (and if = is found, check the next character). You can also forget the best_fit length check, as there can be only one that fits.

The code removes whitespace only around the comparison operator. For consistency, you may want to remove all whitespace at the start and end of the input; then, invalid input can be detected by the return left or right variables having a length of 0 – i.e., they only contain the 0 string terminator. You still need to free those zero-length strings.

For fun, you can add "GT","LT","GE","LE" to the delimiters and see how it does on strings such as A GT B, ALLEQUAL, and FAULTY<MATCH.