limeygent limeygent - 4 months ago 9
PHP Question

How to get <a> tags in <body> but exclude header and footer sections

If I have a webpage like this:

<body>
<header>
<a href='http://domain1.com'>link 1 text</a>
</header>

<a href='http://domain2.com'>link 2 text</a>

<footer>
<a href='http://domain3.com'>link 3 text</a>
</footer>
</body>


How do I pull the
<a>
tags out of the
<body>
but exclude the links from
<header>
and
<footer>
?

In the real web page, there will be a lot of
<a>
tags in the
<header>
so I'd rather not have to cycle through ALL of them.

I want to pull out the URLs and anchor text from each of the
<a>
tags that are NOT inside the
<header>
or
<footer>
tags.

EDIT: this is how I find links in the header:

$header = $html->find('header',0);
foreach ($header->find('a') as $a){
do something
}


I would like to do this (note the use of "!")

$foo = $html->find('!header,!footer');
foreach ($foo->find('a') as $a){
do something
}

Answer

Remove the header and footer from the DOM you are working with before looking for the links.

<?php
    include("simple_html_dom.php");
    $source = <<<EOD
    <body>
        <header>
            <a href='http://domain1.com'>link 1 text</a>
        </header>

        <a href='http://domain2.com'>link 2 text</a>

        <a href='http://domain4.com'>link 4 text</a>

        <footer>
            <a href='http://domain3.com'>link 3 text</a>
        </footer>
    </body>
EOD;

    $html = str_get_html($source);
    foreach ($html->find('header, footer') as $unwanted) {
        $unwanted->outertext = "";
    }
    $html->load($html->save()); 
    $links = $html->find("a");
    foreach ($links as $link) {
        print $link;
};

?>
Comments