Bobby Etheredge Bobby Etheredge - 5 months ago 14
HTML Question

Remove White Space From HTML Created From A Word Doc

I am trying to remove the white space from a HTML file that was created from a Word Document (export to HTML) and I am still unsuccessful.

For example:

<p dir="ltr" class="pt-ListParagraph">
<span class="pt-000003"> </span></p>
<p dir="ltr" class="pt-Normal-000001">
<span class="pt-DefaultParagraphFont-000002">Work Instruction</span></p>
<p dir="ltr" class="pt-Normal">
<span class="pt-000000"> </span></p>
<p dir="ltr" class="pt-Normal">
<span class="pt-DefaultParagraphFont">DEFINITIONS AND ACRONYMS</span></p>
<p dir="ltr" class="pt-BodyText"><span class="pt-000004"> </span></p>
<p dir="ltr" class="pt-Normal-000005">
<span class="pt-DefaultParagraphFont-000006">DO Brief </span>
<span class="pt-DefaultParagraphFont-000007"> </span>


I have tried the CSS selector
p span:empty
and
p span:blank
, which does not work because it see the white space between the
<span class="pt-000000"> </span>
. I have tried the options generated from this post title which are unsuccessful (jQuery is not an option) - I am at a loss. I would like to add a
.js
file in the head of the HTML to run on page load that would remove all of the white space (
<span class="pt-000000"> </span>
) that is generated when a Word Doc is converted to an HTML file. Can anyone offer me some advice?

Removing the spans are an option. However, the span classes will be different every time depending on the export of the Word Doc requiring me to make several span.classes. I have thought about that but figured it was just a band-aid on the issue.

UPDATE
window.addEventListener
did the trick:



window.addEventListener('load', function() {
var spans = document.getElementsByTagName('span');
for (var i = 0; i < spans.length; i++) {
if (spans[i].innerHTML.trim() == '') {
spans[i].remove();
}
}
});




Answer

You could use getElementsByTagName to retrieve a list of all the span elements in your HTML file. Then walk through every span element and check whether it contains only whitespace. If so, set the innerHTML of that span to en empty string.

Example:

var spans = document.getElementsByTagName('span');
for (var i = 0; i < spans.length; i++) {
    if (spans[i].innerHTML.trim() == '') {
        spans[i].remove();
    }
}

Updated example:

window.addEventListener('load', function() {
    var spans = document.getElementsByTagName('span');
    for (var i = 0; i < spans.length; i++) {
        if (spans[i].innerHTML.trim() == '') {
            spans[i].remove();
        }
    }
});