Rohit Rohit - 1 year ago 92
HTML Question

Getting Absolute Path of External Web Page Images

I am working on bookmarklet and I am fetching all the photos of any external page using HTML DOM parser(As suggested earlier by SO answer). I am fetching the photos correctly and displaying that in my bookmarklet pop up. But I am having problem with the relative path of photos.

for example the photo source on external page say

  1. photo Source 1 : img source='hostname/photos/photo.jpg' - Getting photo as it is absolute

  2. photo Source 2 : img source='/photos/photo.jpg' - not getting as it is not absolute.

I worked through the current url I mean using dirname or pathinfo for getting directory by current url. but causes problem between host/dir/ (gives host as parent directory ) and host/dir/index.php (host/dir as parent directory which is correct)

Please help How can I get these relative photos ??

Answer Source

FIXED (added support for query-string only image paths)

function make_absolute_path ($baseUrl, $relativePath) {

    // Parse URLs, return FALSE on failure
    if ((!$baseParts = parse_url($baseUrl)) || (!$pathParts = parse_url($relativePath))) {
        return FALSE;

    // Work-around for pre- 5.4.7 bug in parse_url() for relative protocols
    if (empty($baseParts['host']) && !empty($baseParts['path']) && substr($baseParts['path'], 0, 2) === '//') {
        $parts = explode('/', ltrim($baseParts['path'], '/'));
        $baseParts['host'] = array_shift($parts);
        $baseParts['path'] = '/'.implode('/', $parts);
    if (empty($pathParts['host']) && !empty($pathParts['path']) && substr($pathParts['path'], 0, 2) === '//') {
        $parts = explode('/', ltrim($pathParts['path'], '/'));
        $pathParts['host'] = array_shift($parts);
        $pathParts['path'] = '/'.implode('/', $parts);

    // Relative path has a host component, just return it
    if (!empty($pathParts['host'])) {
        return $relativePath;

    // Normalise base URL (fill in missing info)
    // If base URL doesn't have a host component return error
    if (empty($baseParts['host'])) {
        return FALSE;
    if (empty($baseParts['path'])) {
        $baseParts['path'] = '/';
    if (empty($baseParts['scheme'])) {
        $baseParts['scheme'] = 'http';

    // Start constructing return value
    $result = $baseParts['scheme'].'://';

    // Add username/password if any
    if (!empty($baseParts['user'])) {
        $result .= $baseParts['user'];
        if (!empty($baseParts['pass'])) {
            $result .= ":{$baseParts['pass']}";
        $result .= '@';

    // Add host/port
    $result .= !empty($baseParts['port']) ? "{$baseParts['host']}:{$baseParts['port']}" : $baseParts['host'];

    // Inspect relative path path
    if ($relativePath[0] === '/') {

        // Leading / means from root
        $result .= $relativePath;

    } else if ($relativePath[0] === '?') {

        // Leading ? means query the existing URL
        $result .= $baseParts['path'].$relativePath;

    } else {

        // Get the current working directory
        $resultPath = rtrim(substr($baseParts['path'], -1) === '/' ? trim($baseParts['path']) : str_replace('\\', '/', dirname(trim($baseParts['path']))), '/');

        // Split the image path into components and loop them
        foreach (explode('/', $relativePath) as $pathComponent) {
            switch ($pathComponent) {
                case '': case '.':
                    // a single dot means "this directory" and can be skipped
                    // an empty space is a mistake on somebodies part, and can also be skipped
                case '..':
                     // a double dot means "up a directory"
                    $resultPath = rtrim(str_replace('\\', '/', dirname($resultPath)), '/');
                    // anything else can be added to the path
                    $resultPath .= "/$pathComponent";

        // Add path to result
        $result .= $resultPath;


    return $result;



echo make_absolute_path('','/photos/photo.jpg')."\n";
// Outputs:
echo make_absolute_path('','photos/photo.jpg')."\n";
// Outputs:
echo make_absolute_path('','./photos/photo.jpg')."\n";
// Outputs:
echo make_absolute_path('','../photos/photo.jpg')."\n";
// Outputs:
echo make_absolute_path('','')."\n";
// Outputs:
echo make_absolute_path('','?query=something')."\n";
// Outputs:

I think that should deal with just about everything your likely to encounter correctly, and should equate to roughly the logic used by a browser. Also should correct any oddities you might get on Windows with stray forward slashes from using dirname().

First argument is the full URL of the page where you found the <img> (or <a> or whatever) and second argument is the contents of the src/href etc attribute.

If anyone finds something that doesn't work (cos I know you'll all be trying to break it :-D), let me know and I'll try and fix it.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download