nastetajup nastetajup - 3 months ago 9
Javascript Question

Is it possible to test for breaking spaces?

I'm trying to parse an html file (not the DOM) line by line. I wrote a for loop (below) that iterates every character embedded in an element, tallies it and logs it. The way the text is embedded is important for what I'm talking about:

Text:

<element id="myE">
This is some text that
represents accurately the way I
have written my html
file.
</element>


Loop:
//Starts with 1 because 0 is always a breaking space


var list = document.getElementById("myE").innerHTML;
var tallie = 0;

for (i=1;i<list.length;i++) {
if (/*list[i] == " "*/ true) {
list += 1;
console.log(list[i]);
}
}

console.log(tallie);


Currently the loop is set to test for all characters. The text in the html renders in the DOM as though it were a continuous, properly formatted string. But in the console, inline spaces appear as non-breaking spaces
" "
, and line breaks appear as breaking spaces:

"
"


Since the console appears to know the difference, it seems there should be a way to test for the difference. If you unlock the commented condition, it will start testing for non-breaking spaces. I know another way to do the same thing is to use the non-breaking space character code, or even it's ascii number (160). It seems reasonable then to expect to be able to find a character code for a breaking space. Unfortunately I can not find one.

Long story short, how can I achieve a true line by line parsing of an html file?

Answer

Newline characters are encoded with \n. Sometimes you will also find combinations of carriage return and new line \r\n (see wikipedia on Newline). These should not be confused with a Non Breaking Space &nbsp; or &#160; which are used if you want the browser to not word wrap but still display a space or if you want the browser to not collapse multiple spaces together.

Comments