Kaptnik Kaptnik - 14 days ago 9
C# Question

Non-greedy regex matches just 1 character

A have a list of files, some of whose names are suffixed with a .cloud. How do I write a regular expression that gets the filename without the .cloud part?

Here's a sample perl script I tried.

#! /usr/bin/perl -w

my @log_files = ('infolog.txt', 'errorlog.txt.cloud', 'dailyerrorlog.txt.cloud', 'trace.output.cloud', 'debug.log.cloud');

foreach my $file (@log_files)
{
print $1."\n" if($file =~ /(.+?)(?:\.cloud)?/);
}


This prints the following:

$ perl test.pl
i
e
d
t
d


If I get rid of the '?' that makes the .+ greedy, it matches everything, including .cloud.

#! /usr/bin/perl -w

my @log_files = ('infolog.txt', 'errorlog.txt.cloud', 'dailyerrorlog.txt.cloud', 'trace.output.cloud', 'debug.log.cloud');

foreach my $file (@log_files)
{
print $1."\n" if($file =~ /(.+)(?:\.cloud)?/);
}


This prints the following:

$ perl test.pl
infolog.txt
errorlog.txt.cloud
dailyerrorlog.txt.cloud
trace.output.cloud
debug.log.cloud


What I really want is a regular expression that'll print:

$ perl test.pl
infolog.txt
errorlog.txt
dailyerrorlog.txt
trace.output
debug.log


I've modified my real use case to a very simple example here. I need to use regular expressions here to match the filename, so answers like

$file =~ s/\.cloud$//;
print $file."\n";


will not work for me.

I've tried a similar thing in C# too, with similar results.

static void Main(string[] args)
{
Regex regex = new Regex(@"(?<filename>.+?)(?:\.cloud)?");
string text = "abcdef.txt.cloud";
Match match = regex.Match(text);
if(match.Success)
{
Console.WriteLine("Found filename: {0}", match.Groups["filename"].Value);
}
}
// Output
// Found filename: a


Thanks for any help.

Answer

It only matches one character because you told it to match the least possible number of characters, and .+ isn't allowed to match zero characters.


I'm going to use $PAT instead of .+ since you said it's a stand-in for something more complicated.

Despite your claims that s/// can't be used, it still seems to be the simplest solution to me.

my ($match) = map { s/\.cloud\z//r } $file =~ /^($PAT)\z/;  # 5.14+

or

my ($match) = map { ( my $s = $_ ) =~ s/\.cloud\z//; $s } $file =~ /^($PAT)\z/;

That said, it can also be achieved using a match:

my $match = $file =~ /^(?:($PAT)(?=\.cloud\z)|($PAT))/ ? ($1 // $2) : undef;

By the way, if $PAT was .+, and I wanted to use a match, I'd use the following:

my ($match) = $file =~ /^((?:(?!\.cloud\z).)+)/s;

But it would be far simpler to use

my $match = $file =~ s/\.cloud\z//r;   # 5.14+

or

(my $match = $file) =~ s/\.cloud\z//;