jkushner jkushner - 1 year ago 97
HTML Question

Remove everything within script and style tags

I have a variable named

and it contains html code. There are
codes within
html elements. I want to scan the
and remove these pieces of code. If I can also remove the actual html elements
, I would do that too.

I imagine I need to be using regex however I am not skilled in it.

Can anyone assist?

I wish I could provide some code but like I said I am not skilled in regex so I don't have anything to show.

I cannot use DOM. I need specifically to use regex against these specific tags

Answer Source

Do not use RegEx on HTML. PHP provides a tool for parsing DOM structures, called appropriately DomDocument.

// some HTML for example
$myHtml = '<html><head><script>alert("hi mom!");</script></head><body><style>body { color: red;} </style><h1>This is some content</h1><p>content is awesome</p></body><script src="someFile.js"></script></html>';

// create a new DomDocument object
$doc = new DOMDocument();

// load the HTML into the DomDocument object (this would be your source HTML)

removeElementsByTagName('script', $doc);
removeElementsByTagName('style', $doc);
removeElementsByTagName('link', $doc);

// output cleaned html
echo $doc->saveHtml();

function removeElementsByTagName($tagName, $document) {
  $nodeList = $document->getElementsByTagName($tagName);
  for ($nodeIdx = $nodeList->length; --$nodeIdx >= 0; ) {
    $node = $nodeList->item($nodeIdx);

You can try it here: https://eval.in/private/4f225fa0dcb4eb


Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download