eager2learn eager2learn - 4 months ago 38
C++ Question

C++ regex, unknown escape sequence '\.' warning

First time I tried using regular expressions in C++, and I'm a little confused about escape sequences. I'm simply trying to match a dot at the beginning of a string. For that I'm using the expression: "^\\\.", which works, but my compiler (g++) generates a warning:

warning: unknown escape sequence '\.'
regex self_regex("^\\\.");
^~


If I'm using e.g "^\\.", it does not generate a warning, but that regex does not match what I intend to do.

I also don't understand why I have to use three backslashes, shouldn't two be sufficient, in "\." the first backslash escapes the second one, so that I actually search for ., but it doesn't work. Can someone please clarify this for me?

Code:

#include <iostream>
#include <dirent.h>
#include <regex>

using namespace std;

int main(void){
DIR *dir;
string path = "/Users/-----------/Documents/Bibliothek/MachineLearning/DeepLearning/ConvolutionalNeuralNetworks/CS231n 2016/Assignments/assignment3/assignment3/cs231n";
regex self_regex("^\\\.+");
struct dirent *ent;
dir = opendir(path.c_str());
if ((dir = opendir(path.c_str())) != NULL){
while ((ent = readdir(dir)) != NULL){
if (regex_search(string(ent->d_name),self_regex)){
cout << "matches regex" << ent->d_name << endl;
}
else{
cout << "does not match regex " << ent->d_name << endl;
}
}
closedir(dir);
}
return 0;
}


Output:

matches regex.
matches regex..
matches regex.DS_Store
matches regex.gitignore
does not match regex __init__.py
does not match regex __init__.pyc
does not match regex build
does not match regex captioning_solver.py
does not match regex captioning_solver.pyc
does not match regex classifiers
does not match regex coco_utils.py
does not match regex coco_utils.pyc
does not match regex data_utils.py
does not match regex datasets
does not match regex fast_layers.py
does not match regex fast_layers.pyc
does not match regex gradient_check.py
does not match regex gradient_check.pyc
does not match regex im2col.py
does not match regex im2col.pyc
does not match regex im2col_cython.c
does not match regex im2col_cython.pyx
does not match regex im2col_cython.so
does not match regex image_utils.py
does not match regex image_utils.pyc
does not match regex layer_utils.py
does not match regex layers.py
does not match regex layers.pyc
does not match regex optim.py
does not match regex optim.pyc
does not match regex rnn_layers.py
does not match regex rnn_layers.pyc
does not match regex setup.py

Answer

When you write in your code a string literal:

"^\\\."  

your compiler will parse it according to the C++ rules to generate the string that will be used in your executable. For example if \n would be encountered the string in your executable would contain a newline instead. The "\\" is transformed into "\", but your compiler doesn't know how to handle "\." because there is no such escape sequence defined in C++.

Escape sequences in which the character following the backslash is not listed (...) are conditionally-supported, with implementation-defined semantics.

So the string you're looking for is with only two slashes:

"^\\."

which will be transformed by the compiler into:

"^\."  

And this is the regex you're looking for !

Remark: GCC for example will transform an unknown escape sequence "\." into ".", so that 2 or 3 bakslashes will in reality produce the same result.

Online demo

Comments