yucel yucel - 14 days ago 5
PHP Question

Writing multiple regex pattern to parse HTML

I'm fetching an HTML webpage with

file_get_contents()
, I get a table like below, there are more than 150 rows:

<tr class="tabrow ">
<td class="tabcol tdmin_2l">FIRST+DATA</td>
<td class="tabcol">
<a class="modal-button" title="SECOND+DATA" href="THIRD+DATA" rel="{handler: 'iframe', size: {x: 800, y: 640}, overlayOpacity: 0.9, classWindow: 'phocamaps-plugin-window', classOverlay: 'phocamaps-plugin-overlay'}">
asdxxx
</a>
</td>
<td class="tabcol"></td>
<td class="tabcol">FOURTH+DATA</td>
</tr>


I want to get the
FIRST DATA
,
SECOND DATA
,
THIRD DATA
and
FOURTH DATA
with a
preg_match_all()
call. I tried to write multiple patterns, but I couldn't succeed. Here's what I tried:

preg_match_all('/(<td class="tabcol tdmin_2l">|title=")(.*?)(<\/td>|")/s', $raw, $matches, PREG_SET_ORDER);


What's the true patterns?

Answer

Try this:

$str = <<<HTML
<tr class="tabrow ">
<td class="tabcol  tdmin_2l">FIRST+DATA</td>
<td class="tabcol"><a class="modal-button" title="SECOND+DATA"  href="THIRD+DATA" rel="{handler: 'iframe', size: {x: 800, y: 640}, overlayOpacity: 0.9, classWindow: 'phocamaps-plugin-window', classOverlay: 'phocamaps-plugin-overlay'}">asdxxx</a></td>
<td class="tabcol"></td>
<td class="tabcol">FOURTH+DATA</td>
</tr>
HTML;

preg_match_all('/<td[^>]*>(.*?)<\/td>/im', $str, $td_matches);
preg_match('/ title="([^"]*)"/i', $td_matches[1][1], $title);
preg_match('/ href="([^"]*)"/i', $td_matches[1][1], $href);

echo $td_matches[1][0] . "\n";
echo $title[1] . "\n";
echo $href[1] . "\n";
echo $td_matches[1][3];