Victor Borshchov Victor Borshchov - 3 months ago 9
Ruby Question

Remove whitespace from an array of HTML strings

Given array

["<p>&gt;&gt;&gt;Lorem ipsum dolor</p>",
"<p>Lorem ipsum dolor <strong>sit amet, consectetur adipisicing</strong> elit, sed do eiusmod</p>",
"<p>.....</p>",
"<p> ...</p>",
"<p>tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,</p>",
"<p>quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo</p>",
"<p>… </p>",
"<p>consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse</p>",
"<p>…</p>",
"<p>. . . </p>",
"<p> …</p>",
"<p>cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non</p>",
"<p>…</p>",
"<p>…</p>",
"<p>proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>",
"<p></p>",
"<p></p>",
"<p>proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"]


I want to receive array without paragraph tag which include
or
...
with spaces at begining or at the end and replace tag which include
or
...
with
"<p>…</p>"


["<p>&gt;&gt;&gt;Lorem ipsum dolor</p>",
"<p>Lorem ipsum dolor <strong>sit amet, consectetur adipisicing</strong> elit, sed do eiusmod</p>",
"<p>…</p>",
"<p>tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,</p>",
"<p>quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo</p>",
"<p>…</p>",
"<p>consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse</p>",
"<p>…</p>",
"<p>cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non</p>",
"<p>…</p>",
"<p>proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>",
"<p></p>",
"<p></p>",
"<p>proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"]

Answer

I would loop through each element of the array and modify each of the <p>. . .</p> into the desired format.

array.map! do |el|
    if el =~ /<p>(((\s?\.)+(\s+)?)|(\s+)?…(\s+)?)<\/p>/
        el = '<p>…</p>'
    end
    el
end

This code will replace every p tag with the . . . format with <p>…</p>, resulting in

["<p>&gt;&gt;&gt;Lorem ipsum dolor</p>",
"<p>Lorem ipsum dolor <strong>sit amet, consectetur adipisicing</strong> elit, sed do eiusmod</p>",
"<p>…</p>",
"<p>…</p>",
"<p>tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,</p>",
"<p>quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo</p>",
"<p>…</p>",
"<p>consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse</p>",
"<p>…</p>",
"<p>…</p>",
"<p>…</p>",
"<p>cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non</p>",
"<p>…</p>",
"<p>…</p>",
"<p>proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>",
"<p></p>",
"<p></p>",
"<p>proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>"]

Then I will check each element against the previous element, and delete it if it matches the previous element and the current element is equal to <p>…</p>

idx = array.length - 1
while idx > 0 
    if array[idx] == array[idx - 1] && array[idx] == '<p>…</p>'
        array.delete_at(idx)
    end
    idx -= 1
end
Comments