NGuyen NGuyen - 1 year ago 98
Javascript Question

Python 3: How to extract url image?

The urls I want to extract have same pattern:

"begin" : "url_I_want_extract"

They look like:

"begin" : ""
"begin" : ""
"begin" : ""
"begin" : ""
"begin" : ""

And I used this code to extract but getting unexpected things.

r = re.findall('https://k(.?)*?).jpeg', response.text)

The output I got:

[('2', '16576946054146395951'), ('2', '9460365509030976330'), ('2', '9361112829030898475'), ('3', '14705723619301900580')]

The output I want:

How to use regex to scrape Urls after ""begin"" word ? Thank you :)

Answer Source

The parenthesis surround the capturing groups that are returned by findall. Right now your capturing groups are k(.>) and (.*?).jpeg. Remove those parenthesis and instead capture the entire url.

Also, to match both the url's with "/0x0/0x0/0/" and "/8x36/922x950/0/", replace "/0x0/0x0/0/" in the regex with "/.*/.*/.*/":

r = re.findall('(https://k.?*/.*/.*/.*?.jpeg)', response.text)
