user3198882 user3198882 - 2 months ago 4
HTML Question

Regex to remove characters that are rendered as a single whitespace in browser? How about \s?

So, here is the function,

is a reference to some
Node respectively:

function isSingleWhitespace() {
var spacesCollapsed = self.node.textContent.replace(/[ \n\r\t]+/g, '');
return spacesCollapsed.length === 0;

Here is the regex101:

The question can be split in two parts:

Which characters become collapsed to single whitespace when HTML is rendered, does
class suit to find them? As a part of larger regexp?

What about stuff like
? I need to account everything that is not rendered by browser. Regexp solution is not the only acceptable, actually, a link to, say, underscore js implementation, will suit, but the easier the better, considering no need for IE < 9.

If there is a solution using AngularJS's jQLite, it is also acceptable.

Just a listing of all that characters / html special chars is also acceptable, if such listing exists and is robust enough across browsers.

myf myf

As for which character are collapsed

space character in HTML5 are defined:

The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).

so any subsequent character from this group is collapsed and leading/trailing trimmed in most cases (1), so your regexp seems fine.

As for is there common API to get the "rendered" content

Seems you are reading textContent - it provides actual "source" formatting.

If you used innerText instead, you'd get what you probably want - provided you are in DOM context and in capable environment. See The poor, misunderstood innerText by Kangax.

(1) behaviour depends on CSS and / or node type: for instance <pre> or anything with white-space: pre keeps white space while <p> or anything with white-space: normal gets subsequent space characters collapsed and trimmed.

Try example below:

<p id="p1"> 1  2   3  </p>
document.write( p1.innerText.split(''))

<p id="p2" style="white-space: pre"> 1  2   3  </p>
document.write( p2.innerText.split(''))