Using Nokogiri, I need to parse a block given:
12 AB / 4+ CD
ab = p.css(".some_class").text[....some regex....]
cd = p.css(".some_class").text[....some regex....]
dollars = p.css(".some_class").text[....some regex....]
To get a better answer you would have to clarify exactly what format the AB, CD and Dollar values take but here is a solution based on the example given. It uses a regexp grouping
() to capture the information we're interested in. (see the bottom of the answer for more details)
text = p.css(".some_class").text # one or more digits followed by a space followed by AB, capture the digits ab = text.match(/(\d+) AB/).captures # => "12" # one of more non digits followed by a literal + followed by CD cd = text.match(/(\d+\+) CD/).captures # => "4+" # digits or commas followed by "Dollars" dollars = text.match(/([\d,]+) Dollars/).captures # => "2,600"
Note that if there is no match then
nil so if the values might not exist you would need a check e.g.
if match = text.match(/([\d,]+) Dollars/) dollars = match.captures end
Additional explanation of captures
To match the amount of AB we need a pattern
/\d+ AB/ to identify the right part of the text. However, we're really only interested in the numeric part so we surround that with brackets so that we can extract it. e.g.
irb(main):027:0> match = text.match(/(\d+) AB/) => #<MatchData:0x2ca3440> # the match method returns MatchData if there is a match, nil if not irb(main):028:0> match.to_s # match.to_s gives us the entire text that matched the pattern => "12 AB" irb(main):029:0> match.captures => ["12"] # match.captures gives us an array of the parts of the pattern that were enclosed in () # in our example there is just 1 but there could be multiple irb(main):030:0> match.captures => "12" # the first capture - the bit we want