Max Max - 17 days ago 5
PHP Question

extract stylesheets via regex

Yes, I know, I know, parsing HTML with regular expressions is very bad. But I am working with legacy code that is supposed to extract all

link
and
style
elements from a html page. I would change it and use the
dom
extension instead, but after the regex there is a huge code block which relies on the way
preg_match_all
returns the matched results.

The script is using this regex:

$pattern = '/<(link|style)(?=.+?(?:type="(text\/css)"|>))(?=.+?(?:media="(.*?)"|>))(?=.+?(?:href="(.*?)"|>))(?=.+?(?:rel="(.*?)"|>))[^>]+?\2[^>]+?(?:\/>|<\/style>)\s*/is';

preg_match_all($pattern, $htmlContent, $cssTags);


But it doesnt work. No elements are matched. Unfortunately I really suck at regex, so if someone could help me out it would be great.

Max Max
Answer

Thanks at all for your answers, but I finally rewrote that bit using the DOM extension. That should make it way more robust.

Comments