user3198882 user3198882 - 3 months ago 11
HTML Question

Regex to remove characters that are rendered as a single whitespace in browser? How about \s?

So, here is the function,

self.node
is a reference to some
#text
Node respectively:

function isSingleWhitespace() {
var spacesCollapsed = self.node.textContent.replace(/[ \n\r\t]+/g, '');
return spacesCollapsed.length === 0;
}


Here is the regex101: https://regex101.com/r/hN9mJ6/1

The question can be split in two parts:

Which characters become collapsed to single whitespace when HTML is rendered, does
\s
class suit to find them? As a part of larger regexp?

What about stuff like
&zwsp;
? I need to account everything that is not rendered by browser. Regexp solution is not the only acceptable, actually, a link to, say, underscore js implementation, will suit, but the easier the better, considering no need for IE < 9.

If there is a solution using AngularJS's jQLite, it is also acceptable.

Just a listing of all that characters / html special chars is also acceptable, if such listing exists and is robust enough across browsers.

myf myf
Answer

As for which character are collapsed

space character in HTML5 are defined:

The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).

so any subsequent character from this group is collapsed and leading/trailing trimmed in most cases (1), so your regexp seems fine.

As for is there common API to get the "rendered" content

Seems you are reading textContent - it provides actual "source" formatting.

If you used innerText instead, you'd get what you probably want - provided you are in DOM context and in capable environment. See The poor, misunderstood innerText by Kangax.


(1) behaviour depends on CSS and / or node type: for instance <pre> or anything with white-space: pre keeps white space while <p> or anything with white-space: normal gets subsequent space characters collapsed and trimmed.

Try example below:

<p id="p1"> 1  2   3  </p>
<pre><script>
document.write( p1.innerText.split(''))
</script></pre>

<p id="p2" style="white-space: pre"> 1  2   3  </p>
<pre><script>
document.write( p2.innerText.split(''))
</script></pre>

Comments