dan dan - 1 month ago 19
Ruby Question

Removing <p> elements with no text with Nokogiri

Given an HTML document in Nokogiri, I want to remove all

<p>
nodes with no actual text. This includes
<p>
elements with whitespace and/or
<br/>
tags. What's the most elegant way to do this?

Answer

I would start with a method like this one (feel free to monkeypatch Nokogiri::XML::Node if you want to)

def is_blank?(node)
  (node.text? && node.content.strip == '') || (node.element? && node.name == 'br')
end

Then continue with another method that checks that all children are blank:

def all_children_are_blank?(node)
  node.children.all?{|child| is_blank?(child) } 
  # Here you see the convenience of monkeypatching... sometimes.
end

And finally, get the document and

document.css('p').find_all{|p| all_children_are_blank?(p) }.each do |p|
  p.remove
end
Comments