Sajad HTLO Sajad HTLO - 17 days ago 5
Python Question

Pars Script to find special value with Beautifuo soup

I have this script:

var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');


I m going to get the video url value:

http://cdn.abc.con/video.flv


i tried this, but can't found that:

scripts = soup.find_all("script")
if scripts:
for s in scripts:
crawler_logger.info('s: %s' % s)
l = s.find_all(attrs={'': re.compile(r'\.(flv|mp4)$')})


I want to be able to get all videos like this, without need to knowing url name

Answer

BeautifulSoup doesn't parse javascript. From your script tag s, extract the javascript code as:

code = s.text

Then you can extract the URL manually with regexes like so:

import re

code = """var s1 = new SWFObject('/media/player/flvplayer.swf','single','400','300','7');s1.addParam('allowfullscreen','true');s1.addVariable('file','http://cdn.abc.con/video.flv');s1.addParam('menu','false');s1.addVariable('width','400');s1.addVariable('height','300');s1.write('player1474719921904');"""
url = re.search(r"['\"](https?://.+?\.flv)['\"]", code).group(1)
print(url)   # http://cdn.abc.con/video.flv
Comments