nothing-special-here nothing-special-here - 6 months ago 19
Ruby Question

Convert HTML to plain text (with inclusion of <br>s)

Is it possible to convert HTML with Nokogiri to plain text? I also want to include

<br />
tag.

For example, given this HTML:

<p>ala ma kota</p> <br /> <span>i kot to idiota </span>


I want this output:

ala ma kota
i kot to idiota


When I just call
Nokogiri::HTML(my_html).text
it excludes
<br />
tag:

ala ma kota i kot to idiota

Answer

Instead of writing complex regexp I used Nokogiri.

Working solution (K.I.S.S!):

def strip_html(str)
  document = Nokogiri::HTML.parse(str)
  document.css("br").each { |node| node.replace("\n") }
  document.text
end
Comments