TK123 TK123 - 1 month ago 12
PHP Question

Truncate text without truncating HTML

This string has 78 characters with HTML and 39 characters without HTML:

<p>I really like the <a href="http://google.com">Google</a> search engine.</p>


I want to truncate this string based on the non-HTML character count, so for example if I wanted to truncate the above string to 24 characters, the output would be:

I really like the <a href="http://google.com">Google</a>


The truncation did not take into account the html when determining the number of characters to cut off, it only considered the stripped count. However, it didn't leave open HTML tags.

Answer

Alright so this is what I put together and it seems to be working:

function truncate_html($string, $length, $postfix = '&hellip;', $isHtml = true) {
    $string = trim($string);
    $postfix = (strlen(strip_tags($string)) > $length) ? $postfix : '';
    $i = 0;
    $tags = []; // change to array() if php version < 5.4

    if($isHtml) {
        preg_match_all('/<[^>]+>([^<]*)/', $string, $tagMatches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
        foreach($tagMatches as $tagMatch) {
            if ($tagMatch[0][1] - $i >= $length) {
                break;
            }

            $tag = substr(strtok($tagMatch[0][0], " \t\n\r\0\x0B>"), 1);
            if ($tag[0] != '/') {
                $tags[] = $tag;
            }
            elseif (end($tags) == substr($tag, 1)) {
                array_pop($tags);
            }

            $i += $tagMatch[1][1] - $tagMatch[0][1];
        }
    }

    return substr($string, 0, $length = min(strlen($string), $length + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . $postfix;
}

Usage:

truncate_html('<p>I really like the <a href="http://google.com">Google</a> search engine.</p>', 24);

The function was grabbed from (made a small modification):

http://www.dzone.com/snippets/truncate-text-preserving-html