ThisIsMe ThisIsMe - 1 year ago 83
HTTP Question

regex for http header in C

I wanna extract strings from http header like:

using regex. I use this pattern:
and this works good and splits
. But when I use this pattern in C, it doesn't escape
doesn't detect in C). How can I do this? or is there a better pattern for extract strings from http header?

Answer Source

Note you do not need to escape a forward slash in a C regex library since the regcomp does not support regex delimiters.

All you need is to properly initialize the regmatch_t, size_t variables, use double escapes with the \s shorthand character class, and pass the REG_EXTENDED flag to the regex compiler.

I also suggest reducing the pattern to just 3 capture groups:

const char *str_regex = "([A-Za-z]+) +(http?://.*) +(HTTP/[0-9][.][0-9])";

Note the dot is "escaped" by putting it into a bracket expression.

Full C demo extracting GET, and HTTP/1.1:

#include <stdio.h>
#include <stdlib.h>
#include <regex.h>

int main (void)
  int match;
  int err;
  regex_t preg;
  regmatch_t pmatch[4]; // We have 3 capturing groups + the whole match group
  size_t nmatch = 4; // Same as above
  const char *str_request = "GET HTTP/1.1";

  const char *str_regex = "([A-Za-z]+) +(http?://.*) +(HTTP/[0-9][.][0-9])";
  err = regcomp(&preg, str_regex, REG_EXTENDED);
  if (err == 0)
      match = regexec(&preg, str_request, nmatch, pmatch, 0);
      nmatch = preg.re_nsub;
      if (match == 0)
          printf("\"%.*s\"\n", pmatch[1].rm_eo - pmatch[1].rm_so, &str_request[pmatch[1].rm_so]);
          printf("\"%.*s\"\n", pmatch[2].rm_eo - pmatch[2].rm_so, &str_request[pmatch[2].rm_so]);
          printf("\"%.*s\"\n", pmatch[3].rm_eo - pmatch[3].rm_so, &str_request[pmatch[3].rm_so]);
      else if (match == REG_NOMATCH)
  return 0;