Shaggy Frog Shaggy Frog -4 years ago 63
CSS Question

Extracting text from a contentEditable div

I have a div set to

contentEditable
and styled with "
white-space:pre
" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.

We are using jQuery, whose
text()
function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.

I had a look at the
html()
function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my
contentEditable
div. Assuming I type this into my div:

1
2
3


These are the results:

Safari 4:

1
<div>2</div>
<div>3</div>


Firefox 3.6:

1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">


IE 8:

<P>1</P><P>2</P><P>3</P>


Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)

The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a
pre
tag (which was alluded to on some pages I found using Google).

Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.

Update: I cribbed the
getText
function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);

function extractTextWithWhitespace( elems ) {
var ret = "", elem;

for ( var i = 0; elems[i]; i++ ) {
elem = elems[i];

// Get the text from text nodes and CDATA nodes
if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
ret += elem.nodeValue + "\n";

// Traverse everything else, except comment nodes
} else if ( elem.nodeType !== 8 ) {
ret += extractTextWithWhitespace2( elem.childNodes );
}
}

return ret;
}


I call this function and use its output to assign it to an XML node with jQuery, something like:

var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);


The resulting XML is eventually sent to a server via an AJAX call.

This works well in Safari and Firefox.

On IE, only the first '\n' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):

return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );


Reading up on
createTextNode
, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?

Answer Source

I forgot about this question until now, when Nico slapped a bounty on it.

I solved the problem by writing the function I needed myself, cribbing a function from the existing jQuery codebase and modifying it to work as I needed.

I've tested this function with Safari (WebKit), IE, Firefox and Opera. I didn't bother checking any other browsers since the whole contentEditable thing is non-standard. It is also possible that an update to any browser could break this function if they change how they implement contentEditable. So programmer beware.

function extractTextWithWhitespace(elems)
{
    var lineBreakNodeName = "BR"; // Use <br> as a default
    if ($.browser.webkit)
    {
        lineBreakNodeName = "DIV";
    }
    else if ($.browser.msie)
    {
        lineBreakNodeName = "P";
    }
    else if ($.browser.mozilla)
    {
        lineBreakNodeName = "BR";
    }
    else if ($.browser.opera)
    {
        lineBreakNodeName = "P";
    }
    var extractedText = extractTextWithWhitespaceWorker(elems, lineBreakNodeName);

    return extractedText;
}

// Cribbed from jQuery 1.4.2 (getText) and modified to retain whitespace
function extractTextWithWhitespaceWorker(elems, lineBreakNodeName)
{
    var ret = "";
    var elem;

    for (var i = 0; elems[i]; i++)
    {
        elem = elems[i];

        if (elem.nodeType === 3     // text node
            || elem.nodeType === 4) // CDATA node
        {
            ret += elem.nodeValue;
        }

        if (elem.nodeName === lineBreakNodeName)
        {
            ret += "\n";
        }

        if (elem.nodeType !== 8) // comment node
        {
            ret += extractTextWithWhitespace(elem.childNodes, lineBreakNodeName);
        }
    }

    return ret;
}
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download