TK123 TK123 - 1 month ago 10
PHP Question

How to truncate text without truncating html and based off text count only?

This string has 78 characters with HTML and 39 characters without HTML:

<p>I really like the <a href="http://google.com">Google</a> search engine.</p>


I want to truncate this string based on the non HTML count, so example if I wanted to truncate it to 24 characters, the output should be:

I really like the <a href="http://google.com">Google</a>


The truncation did not take into account the html when determining the number of characters to cut off, it only considered the stripped count. However it also didn't break the html. How to achieve this?

Answer

Alright so this is what I put together and it seems to be working:

function truncate_html($string, $length, $postfix = '&hellip;', $isHtml = true) {
    $string = trim($string);
    $postfix = (strlen(strip_tags($string)) > $length) ? $postfix : '';
    $i = 0;
    $tags = []; // change to array() if php version < 5.4

    if($isHtml) {
        preg_match_all('/<[^>]+>([^<]*)/', $string, $tagMatches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
        foreach($tagMatches as $tagMatch) {
            if ($tagMatch[0][1] - $i >= $length) {
                break;
            }

            $tag = substr(strtok($tagMatch[0][0], " \t\n\r\0\x0B>"), 1);
            if ($tag[0] != '/') {
                $tags[] = $tag;
            }
            elseif (end($tags) == substr($tag, 1)) {
                array_pop($tags);
            }

            $i += $tagMatch[1][1] - $tagMatch[0][1];
        }
    }

    return substr($string, 0, $length = min(strlen($string), $length + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . $postfix;
}

Usage:

truncate_html('<p>I really like the <a href="http://google.com">Google</a> search engine.</p>', 24);

The function was grabbed from (made a small modification):

http://www.dzone.com/snippets/truncate-text-preserving-html