Nandish Nandish - 1 year ago 77
Python Question

Scraping web contents from first two page and export scraped data to csv using python and BS4

I m new to python and using Python 3.6.2 and I m trying to scrape data from first 2 page using a specific keyword. So far I m able to get the data into Python IDLE window, but I m facing difficulty in exporting data to CSV.I have tried using BeautifulSoup 4 and pandas but not able to export. Here is the so far what I have done. Any help would be much appreciated.

import csv
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "
request = requests.get(url)
soup = BeautifulSoup(request.content, "lxml")
#filename = auto.csv
#with open(str(auto.csv,"r+","\n")) as csvfile:
#headers = "Count , Asin \n"
for url in soup.find_all('li'):
Nand = url.get('data-asin')
Result = url.get('id')
#d=(str(Nand), str(Result))

#with open("auto.txt", "w",newline='') as dumpfile:
#dumpfilewriter = csv.writer(dumpfile)
#for Nand in soup:
#value = Nand.__gt__
#if value:
csvfile.csv.writer("auto.csv," , ',' ,'|' , "\n")

Answer Source

I added user-agent in request to site to escape auto blocking bots. You got a lot of None because you didn't specify which precisely <li> tags do you want. I added it to code as well.

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = ""
request = requests.get(url, headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'})    
soup = BeautifulSoup(request.content, "lxml")

res = []

for url in soup.find_all('li', class_ = 's-result-item'):
    res.append([url.get('data-asin'), url.get('id')])

df = pd.DataFrame(data=res, columns=['Nand', 'Result'])    

EDIT: for processing all pages you need to build a loop that generates urls which you will then pass to main processing block (which you already have). Check out this page:,B01MY1ZZDS,B01N0RMJ1H.

Interesting here is ref parameter - ref=sr_pg_2. Its value is for page 2. I think you know, what to do next=)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download