David Harper David Harper - 2 months ago 10
C Question

POSIX regex - zero or one matches of bracket expression?

I'm trying to use regex to parse source files and search for functions in C programs that start with the word "LOG" and may or may not be followed by a second character from the class [1248AFM], which is then followed by an opening parenthesis. This is being developed under Windows using mingw but will ultimately be compiled and run under Linux using gcc. I'm using the Jan Goyvaerts regex tutorial as a guide and it seems like what I'm after is either zero or one matches of the bracket expression expression shown above. Zero or one sounds a lot like the question mark metacharacter but in my experiments I have yet to be able to get that to work following a bracket expression. To illustrate what I'm trying to do I have the short program shown below. Ideally, I would like to have a match on str1 and str2 only. If I compile and run it as shown, I don't get a match on anything. If I leave out the question mark following the bracket expression, I get a match on str2 only, which is what I would expect. In addition to the question mark, I've also tried an interval quantifier of the form {0,1} but had no success with that either. Is there something other than a bracket expression that I should be using?

Dave

#include <stdio.h>
#include <regex.h>

int main(int argc, char **argv) {
regex_t regex;
int rtn = regcomp(&regex, "LOG[1248AFM]?(", 0);
if (rtn) {
printf("compile failed\n");
return(1);
}
char *str1 = " LOG(";
char *str2 = " LOGM(";
char *str3 = " LOG";
char *str4 = " LOGJ(";

int rtn1 = regexec(&regex, str1, 0, NULL, 0);
int rtn2 = regexec(&regex, str2, 0, NULL, 0);
int rtn3 = regexec(&regex, str3, 0, NULL, 0);
int rtn4 = regexec(&regex, str4, 0, NULL, 0);
printf("str1: %d\nstr2: %d\nstr3: %d\nstr4: %d\n",
rtn1, rtn2, rtn3, rtn4);

return(0);
}

Answer

Like Casimir et Hippolyte said: you need to escape the ? which escaped me when I did the comment. The problem is that you use a string literal, that means you have to escape the escape.

#include <stdio.h>
#include <regex.h>

int main(int argc, char **argv) {
  regex_t regex;
  int rtn = regcomp(&regex, "LOG[1248AFM]\\?(",0);
  if (rtn) {
    printf("compile failed\n");
    return(1);
  }
  char *str1 = "  LOG(";
  char *str2 = "  LOGM(";
  char *str3 = "  LOG";
  char *str4 = "  LOGJ(";

  int rtn1 = regexec(&regex, str1, 0, NULL, 0);
  int rtn2 = regexec(&regex, str2, 0, NULL, 0);
  int rtn3 = regexec(&regex, str3, 0, NULL, 0);
  int rtn4 = regexec(&regex, str4, 0, NULL, 0);
  printf("str1: %d\nstr2: %d\nstr3: %d\nstr4: %d\n",
    rtn1, rtn2, rtn3, rtn4);

  return(0);
}

Gives

str1: 0
str2: 0
str3: 1
str4: 1
Comments