ThisIsMe ThisIsMe - 4 months ago 19
HTTP Question

regex for http header in C

I wanna extract strings from http header like:

GET http://www.example.com HTTP/1.1
using regex. I use this pattern:
^([A-Za-z]+)(\s+)(http?):\/\/(.*)(\s+)(HTTP\/)([0-9].[0-9])
and this works good and splits
GET
,
http://www.example.com
and
HTTP/1.1
. But when I use this pattern in C, it doesn't escape
/
(i.e,
\/\/
doesn't detect in C). How can I do this? or is there a better pattern for extract strings from http header?

Answer

Note you do not need to escape a forward slash in a C regex library since the regcomp does not support regex delimiters.

All you need is to properly initialize the regmatch_t, size_t variables, use double escapes with the \s shorthand character class, and pass the REG_EXTENDED flag to the regex compiler.

I also suggest reducing the pattern to just 3 capture groups:

const char *str_regex = "([A-Za-z]+) +(http?://.*) +(HTTP/[0-9][.][0-9])";

Note the dot is "escaped" by putting it into a bracket expression.

Full C demo extracting GET, http://www.example.com and HTTP/1.1:

#include <stdio.h>
#include <stdlib.h>
#include <regex.h>

int main (void)
{
  int match;
  int err;
  regex_t preg;
  regmatch_t pmatch[4]; // We have 3 capturing groups + the whole match group
  size_t nmatch = 4; // Same as above
  const char *str_request = "GET http://www.example.com HTTP/1.1";

  const char *str_regex = "([A-Za-z]+) +(http?://.*) +(HTTP/[0-9][.][0-9])";
  err = regcomp(&preg, str_regex, REG_EXTENDED);
  if (err == 0)
    {
      match = regexec(&preg, str_request, nmatch, pmatch, 0);
      nmatch = preg.re_nsub;
      regfree(&preg);
      if (match == 0)
        {
          printf("\"%.*s\"\n", pmatch[1].rm_eo - pmatch[1].rm_so, &str_request[pmatch[1].rm_so]);
          printf("\"%.*s\"\n", pmatch[2].rm_eo - pmatch[2].rm_so, &str_request[pmatch[2].rm_so]);
          printf("\"%.*s\"\n", pmatch[3].rm_eo - pmatch[3].rm_so, &str_request[pmatch[3].rm_so]);
        }
      else if (match == REG_NOMATCH)
        {
          printf("unmatch\n");
        }
    }
  return 0;
 }
Comments