Bogdan Gudyma Bogdan Gudyma - 9 months ago 51
PHP Question

HTML Purifier convert & -> &

i'm using HTML Purifier(Yii2) in my text field.

I need save "&" in original, but purifier convert to "&amp";

I don't want use

after purifier.

Can you help me with configuration?

my config:

'filter' => function($value) {
return HtmlPurifier::process($value, [
'HTML.SafeObject' => true,
'HTML.SafeEmbed' => true,
'Core.EscapeNonASCIICharacters' => true,
'Core.Encoding' => 'UTF-8'


Example of text, what i want purify: "Company name & Co"

Answer Source

You mentioned in the comments that you're purifying before entering information into your database.

I recommend you rethink this from an architectural point of view, as it has a couple of shortfalls, such as that you lose your original user input (which you may later want to analyse for any one reason), that your database becomes less useful once you want to do something else with the data, and that bugs in your current version of HTML Purifier (which may be security relevant) won't be ironed out. You can see more information about the importance of escaping/sanitising for context in this answer.

That said, your problem has been previously discussed on the HTML Purifier fora: Do not escape ampersand. The thread discusses why it's difficult to treat & differently and remain secure and essentially 'recommends' not using HTML Purifier, which of course doesn't solve your problem.

Nonetheless, there are suggestions and thoughts from within that thread which may help you if you're forced to store purified HTML in your database:

Perhaps a more useful response would be: store the raw, user submitted data (without running HTML Purifier on it) in the database, and run search queries on that. However, store in the database as well a cached version of the HTML Purified version.

Or (this uses < as an example):

No such boolean flag exists, and it would be reasonably tricky to implement safely (you'd want to do something silly like convert literal < and friends to some unforgeable piece of text and then convert &lt; to the literal version.)

But latter is not a robust approach and former is an unnecessary redundancy.