Zach Panzarino Zach Panzarino - 4 months ago 21
HTML Question

Remove certain tags from html input with PHP

I have a form where users are able to style their own input with html. I want to clean that input on the server side with PHP. However, I want to make sure that all the input is secure and matches what I would like it to be. I already have XSS protection so this is not about removing scripts.

When the user provides input, I want to remove tags other than

p
,
img
,
a
,
hr
,
br
,
tbody
,
tr
,
td
,
pre
,
ul
,
ol
,
li
and
span
(basically all text formatting other than divs). I want to remove any attributes other than
href
for
<a>
,
src
for
<img>
, and
style
for
<p>
. For
<p>
style I would only like to preserve the following attributes:


  • color

  • background-color

  • line-height

  • Anything that starts with
    text-



In addition, I want to be able to crop the text to a certain length while preserving ending tags and making sure that every opening tag also has a closing tag.

For example, how does the Stack Overflow editor parse and clean input before saving it and displaying it to the user?

Thanks.

Answer

I use http://htmlpurifier.org/ to clean html-input. You can define the tags, attributes and styles that are allowed. I added the code from my project as an example.

    $configuration = HTMLPurifier_Config::createDefault();
    $configuration->set('Attr.EnableID', true);
    $configuration->set('AutoFormat.RemoveEmpty', true);
    $configuration->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
    $configuration->set('HTML.AllowedAttributes', array('span.style', '*.id', '*.src', 'a.href', 'table.style', 'img.style', 'td.colspan', 'td.rowspan', 'td.style'));
    $styles = array('margin-left', 'color', 'background-color', 'text-decoration', 'font-weight', 'font-style', 'border', 'border-collapse', 'height');
    $configuration->set('CSS.AllowedProperties', $styles);
    $htmlPurifier = new HTMLPurifier($configuration);
    return $htmlPurifier->purify($html);
Comments