How to split by HTML tags using a regex

I have a string like this:

"Energia El├ętrica kWh<span class=\"_ _3\"> </span> 10.942 <span class=\"_ _4\"> </span> 0,74999294 <span class=\"_ _5\"> </span> 8.206,39"

and I want to split it by its HTML tags, which are always
. I tried something like:


but it didn't work, it only matched the first element correctly.

Does anyone know what is wrong with my regex? In this example, I expected the returned value to be:

["Energia El├ętrica kWh", "10.942", "0,74999294" ,"8.206,39"]

I would like something like
, but instead of returning the string sanitized, get the array split by the tags removed.

Answer Source

If you really need to use regex to do this, you pretty much had it already.

irb(main):010:0> string.split(/<span.+?span>/)
=> ["Energia Eltrica kWh", "  10.942 ", " 0,74999294 ", "     8.206,39"]

You just needed the ? to tell it to match as little as possible.

