Umair Umair - 1 month ago 6
Python Question

Regex not working to get string between 2 strings. Python 27

From this URL view-source:https://www.amazon.com/dp/073532753X?smid=A3P5ROKL5A1OLE
I want to get string between

var iframeContent =
and
obj.onloadCallback = onloadCallback;


I have this regex
iframeContent(.*?)obj.onloadCallback = onloadCallback;


But it does not work. I am not good at regex so please pardon my lack of knowledge.

I even tried
iframeContent(.*?)obj.onloadCallback
but it does not work.

Answer

It looks like you just want that giant encoded string. I believe yours is failing for two reasons. You're not running in DOTALL mode, which means your . won't match across multiple lines, and your regex is failing because of catastrophic backtracking, which can happen when you have a very long variable length match that matches the same characters as the ones following it.

This should get what you want

m = re.search(r'var iframeContent = \"([^"]+)\"', html_source)
print m.group(1)