Burak Burak - 11 months ago 94
Python Question

web scraping with beautifulsoup

I'm trying to parsing the website only particular part. Here is my code below. Is there anyway to do it more efficient.

from bs4 import BeautifulSoup
import requests
import urllib.request
import json

soup = BeautifulSoup(requests.get("http://www.example.com").content, "html.parser")

for d in soup.select("script[type=text/javascript]"):

Here is the output what i need

> dataLayer = [{
> 'page':'ProductPage',
> 'OAM':'False',
> 'storeNum':'075',
> 'brand':'Seagate',
> 'productPrice':'69.99',
> 'SKU':'106674',
> 'productID':'467336',
> 'mpn':'ST2000DM006',
> 'ean':'763649110218',
> 'category':'Internal Hard Drives',
> 'isMobile':'False' }];

Answer Source

It can change on other page - (I didn't check it with other pages)

for d in soup.select("script[type=text/javascript]")[27].text.split('\n')[51:62]:


'category':'Tablet Accessories',

EDIT: other version:

text = soup.select("head script[type=text/javascript]")[-1].text

start = text.find('dataLayer = [{') + len('dataLayer = [{') 
end = text.rfind('}];')

rows = text[start:end].strip().split('\n')

for d in rows: