vloryan vloryan -4 years ago 63
PHP Question

External content via class

I am using the following code succesfully to receive external content from a table class.

$url = 'https://www.anything.com';
$content = file_get_contents($url);
$first_step = explode( '<table class="main">' , $content );
$second_step = explode("</table>" , $first_step[1] );

echo $second_step[0];


Now I need the content from a
<a class="link">content</a>
, but

$url = 'https://www.anything.com';
$content = file_get_contents($url);
$first_step = explode( '<a class="link">' , $content );
$second_step = explode("</a>" , $first_step[1] );


does not work.

Meanwhile I use this code

// Create DOM from URL or file

$sFilex = file_get_html("https://www.anything.com", False, $cxContext);

// Find all links
foreach($sFilex->find('a[class=link]') as $element)
echo $element->href . '<br>';


to get all
<a class="link">content</a>
links successfully. But how can
I limit this to the first found result only?

Thanks for your help!

Answer Source

Since I recommended using a proper HTML parser, which can be a bit intimidating for the uninitiated, I figured I could give you an example, to start of with:

$url = 'https://www.anything.com';

// create a new DOMDocument (an XML/HTML parser)
$doc = new DOMDocument;
// this is used to repair possibly malformed HTML
$doc->recover = true;

// libxml is the parse library that DOMDocument internally uses
// put errors in a memory buffer, in stead of outputting them immediately (basically ignore them, until you need them, if ever)
libxml_use_internal_errors( true );

// load the external URL; this might not work if retrieving external files is disabled.
// I will come back on that if it doesn't work for you.
$doc->loadHTMLFile( $url );

// xpath is a query language that allows you to query XML/HTML data structures.
// we create an DOMXPath instance that operates on the earlier created DOMDocument
$xpath = new DOMXPath( $doc );

// this is a query to get all <table class="main">
// note though, that it will also match <table class="test maintain">, etc.
// which might not be what you need
$tableMainQuery = '//table[contains(@class,"main")]';
/* explanation:
   //         match any descendant of the current context, in this case root
   table      match <table> elements
   []         with the predicate(s)
   contains() match a string, that contains some string, in this case:
   @class     the attribute 'class'
   'main'     containing the string main
*/   

// execute the query
// $nodes will be an instance of DOMNodeList (containing DOMNode instances)
$nodes = $xpath->query( $tableMainQuery );

// loop through all nodes
foreach( $nodes as $node ) {
  // echo the inner HTML content of the found node (or do something else with it)
  // the getInnerHTML() helper function is defined below)
  echo htmlentities( getInnerHTML( $node ) );
}

// this is a query to get all <a class="link">
// similar comments and explanation apply as with previous query
$aLinkQuery = '//a[contains(@class,"link")]';

// execute the query
// $nodes will be an instance of DOMNodeList (containing DOMNode instances)
$nodes = $xpath->query( $aLinkQuery );

// loop through all nodes
foreach( $nodes as $node ) {
  // do something with the found nodes again
}

// clear any errors still left in memory
libxml_clear_errors();
// set previous state
libxml_use_internal_errors( $useInternalErrors );

// the helper function to get the inner HTML of a found node
function getInnerHTML( DOMNode $node ) {
  $html = '';
  foreach( $node->childNodes as $childNode ) {
    $html .= $childNode->ownerDocument->saveHTML( $childNode );
  }

  return $html;
}

Now, to get only the first found node of an xpath query (a DOMNodeList instance), I think the simplest would be:

if( $nodes->length > 0 ) {
  $node = $nodes->item( 0 );
}

// or, perhaps
if( null !== ( $node = $nodes->item( 0 ) ) ) {
  // do something with the $node
}

You could also adjust the xpath query to only find the first matching node, but I believe it would then still return a DOMNodeList.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download