A.D. A.D. - 5 months ago 29
Ruby Question

I can't remove whitespaces from a string parsed by Nokogiri

I can't remove whitespaces from a string.

My HTML is:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>


My code is:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text => "139 "
val.strip => "139 "
val.gsub(" ", "") => "139 "


gsub
,
strip
, etc. don't work. Why, and how do I fix this?

val.class => String
val.dump => "\"139\\u{a0}\"" !
val.encoding => #<Encoding:UTF-8>

__ENCODING__ => #<Encoding:UTF-8>
Encoding.default_external => #<Encoding:UTF-8>


I'm using Ruby 1.9.3 so Unicode shouldn't be problem.

Answer

strip only removes ASCII whitespace and the character you've got here is a Unicode no-break space.

Removing the character is easy. You can use gsub by providing a regex with the character code: gsub(/\u00a0/, '')

You could also call gsub(/[[:space:]]/, '') to remove all Unicode whitespace. For details, check the documentation