dan dan - 11 months ago 78
Ruby Question

Removing <p> elements with no text with Nokogiri

Given an HTML document in Nokogiri, I want to remove all

nodes with no actual text. This includes
elements with whitespace and/or
tags. What's the most elegant way to do this?

Answer Source

I would start with a method like this one (feel free to monkeypatch Nokogiri::XML::Node if you want to)

def is_blank?(node)
  (node.text? && node.content.strip == '') || (node.element? && node.name == 'br')

Then continue with another method that checks that all children are blank:

def all_children_are_blank?(node)
  node.children.all?{|child| is_blank?(child) } 
  # Here you see the convenience of monkeypatching... sometimes.

And finally, get the document and

document.css('p').find_all{|p| all_children_are_blank?(p) }.each do |p|