Tcl_newbie Tcl_newbie - 4 months ago 7
Perl Question

Extracting part of a URL

I have used a regex to extract a URI into a Perl variable. I now need to extract a part of this.

For example, say

$2
contains the URI part and the URI may or may not have query parameters i.e it may be in the format
/aaa/bbb/ccc/ddd/eee
or
/aaa/bbb/ccc?eee=true&fff=false


I want to extract the first N slashes depending on an argument received by the Perl program. For instance up to
/aaa/bbb
or
/aaa/bbb/ccc
.

The problem I am facing is that the part after third slash may or may not have query parameters.

How do I ignore the query parameters, if they exist?

Answer

This will do as you ask. It uses the URI module, and builds an object from each URL string so that the convenient methods can be used to manipulate the contents

First the query is removed with $url->query(undef). Then the path is split into a list of segments in @path, and that list is truncated to the length required

The result is turned back into a string and returned

The program extracts one, two, and three-segment paths from each of the URLs that you gave as an example

use strict;
use warnings 'all';
use feature 'say';

use URI;

my $url1 = '/aaa/bbb/ccc/ddd/eee';
my $url2 = '/aaa/bbb/ccc?eee=true&fff=false';

for my $url ( $url1, $url2 ) {
    print trim_path($url, $_), "\n" for 1 .. 5;
    print "\n";
}

sub trim_path {
    my ($url, $n) = @_;
    $url = URI->new($url);

    $url->query(undef);

    my @path = $url->path_segments;
    $url->path_segments( @path[0..$n] ) if $n < $#path;

    return "$url";
}

output

/aaa
/aaa/bbb
/aaa/bbb/ccc
/aaa/bbb/ccc/ddd
/aaa/bbb/ccc/ddd/eee

/aaa
/aaa/bbb
/aaa/bbb/ccc
/aaa/bbb/ccc
/aaa/bbb/ccc
Comments