TheEditor TheEditor - 2 months ago 7
HTML Question

Simple HTML DOM getting all attributes from a tag

Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an

<div id="foo">
<div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a>
<div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>


Here is what I'm using now.

$articles = array();
$html=file_get_html('http://foo.bar');
foreach($html->find('div[class=bar] a') as $a){
$articles[] = array($a->href,$a->innertext);
}


This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.

How do I grab those inner data tags at the same time I grab the href and innertext.

Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.

Thanks

Answer

To grab all those attributes, you should before investigate the parsed element, like this:

foreach($html->find('div[class=bar] a') as $a){
  var_dump($a->attr);
}

...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.

If they exist, you can read them like this:

foreach($html->find('div[class=bar] a') as $a){
  $article = array($a->href, $a->innertext);
  if (isset($a->attr['data1'])) {
    $article['data1'] = $a->attr['data1'];
  }
  if (isset($a->attr['data2'])) {
    $article['data2'] = $a->attr['data2'];
  }
  //...
  $articles[] = $article;
}

To get both classes you can use a multiple selector, separated by a comma:

foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...
Comments