nilanjanaLodh nilanjanaLodh - 11 months ago 50
Python Question

Double quotes inside single quotes inside an re expression (python)

I am new to python. I was going through a repository on gitHub , and I saw the following line of code to extract all URLs from a webpage. I understand Regular expressions and capture groups , but I don't understand why there are extra double quotation marks enclosed within the single quotation marks?

links = re.findall('"((http|ftp)s?://.*?)"', html)

That is, how is it different from the following code ?

links = re.findall('((http|ftp)s?://.*?)', html)

I tried experimenting and saw that only the first one matches the URL syntax correctly but the second one doesn't . But I don't understand why.

Any help is appreciated.

Thank you.


The double quotes are part of the regex. They ensure that the pattern only matches if it is actually surrounded by quotes; so foo bar wouldn't match, but <a href=""> will.

Note this is a really fragile way of doing things, though, since single quotes are also valid in HTML but wouldn't match the regex.