Eric Eric - 1 year ago 45
Ruby Question

Ruby - Matching Twitter URL from any html page using Regex

I am trying to fetch the Twitter URL from this page for instance; however, my result is

. I am pretty sure my regex is not too bad, but my code fails. Here is it :

doc = `(curl --url "")`
twitter_url = ("/^(?i)[http|https]+:\/\/(?i)[twitter]+\.(?i)(com)\/?\S+").match(doc)
puts twitter_url
# => nil

Maybe, I misused regex syntax. My initial idea was simple: I wanted to match a regular Twitter url structure. I even tried to test my regex, and it seemed to be fine when I entered a Twitter url.

Answer Source

tells you that the object you're calling match on should be the string you're parsing, and the parameter should be the regex pattern. So if anything, you should call :


I prefer


syntax, because it directly delivers a String, and not a MatchData, which needs another step to get the information out of.

For Regexen, I always try to begin to begin as simple as possible

[3] pry(main)> doc[/twitter/]
=> "twitter"
[4] pry(main)> doc[/twitter\.com/]
=> ""
[5] pry(main)> doc[/twitter\.com\//]
=> ""
[6] pry(main)> doc[/twitter\.com\/\//] #OOPS. One \/ too many
=> nil
[7] pry(main)> doc[/twitter\.com\//]
=> ""
[8] pry(main)> doc[/twitter\.com\/\S+/]
=> "\""
[9] pry(main)> doc[/twitter\.com\/[^"]+/]
=> ""
[10] pry(main)> doc[/http:\/\/twitter\.com\/[^"]+/]
=> nil
[11] pry(main)> doc[/https?:\/\/twitter\.com\/[^"]+/]
=> ""
[12] pry(main)> doc[/https?:\/\/twitter\.com\/[^" ]+/] #DONE
=> ""