TheEditor TheEditor - 10 months ago 44
HTML Question

Simple HTML DOM getting all attributes from a tag

Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an

<div id="foo">
<div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a>
<div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>


Here is what I'm using now.

$articles = array();
$html=file_get_html('http://foo.bar');
foreach($html->find('div[class=bar] a') as $a){
$articles[] = array($a->href,$a->innertext);
}


This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.

How do I grab those inner data tags at the same time I grab the href and innertext.

Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.

Thanks

Answer Source

To grab all those attributes, you should before investigate the parsed element, like this:

foreach($html->find('div[class=bar] a') as $a){
  var_dump($a->attr);
}

...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.

If they exist, you can read them like this:

foreach($html->find('div[class=bar] a') as $a){
  $article = array($a->href, $a->innertext);
  if (isset($a->attr['data1'])) {
    $article['data1'] = $a->attr['data1'];
  }
  if (isset($a->attr['data2'])) {
    $article['data2'] = $a->attr['data2'];
  }
  //...
  $articles[] = $article;
}

To get both classes you can use a multiple selector, separated by a comma:

foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...