Justin Stayton Justin Stayton - 15 days ago 5
HTML Question

Converting HTML to plain text in PHP for e-mail

I use TinyMCE to allow minimal formatting of text within my site. From the HTML that's produced, I'd like to convert it to plain text for e-mail. I've been using a class called html2text, but it's really lacking in UTF-8 support, among other things. I do, however, like that it maps certain HTML tags to plain text formatting — like putting underscores around text that previously had <i> tags in the HTML.

Does anyone use a similar approach to converting HTML to plain text in PHP? And if so: Do you recommend any third-party classes that I can use? Or how do you best tackle this issue?

Thanks!

Answer

Use html2text (GitHub; example HTML to text), licensed under the Eclipse Public License. It uses PHP's DOM methods to load from HTML, and then iterates over the resulting DOM to extract plain text. Usage:

$text = convert_html_to_text($html);

Although incomplete, it is open source and contributions are welcome.

Issues with other conversion scripts:

  • Since html2text (GPL) is not EPL-compatible.
  • lkessler's link (attribution) is incompatible with most open source licenses.