L. K. L. K. - 1 month ago 16
Python Question

Set cookie in BeautifulSoup Python web scraper

I'm trying to create a python script that goes to a web page and check if there is a div with a specified id in it and if there isn't, it tries again deleting a given cookie.

So far, this is my code:

import urllib2
from BeautifulSoup import BeautifulSoup
import time

url = 'http://google.com'
cookie = 'hello'

while True:
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
soup.prettify()
if soup.find(id='hello'):
print "Found!"
break
else:
#DELETE THE GIVEN COOKIE AND TRY AGAIN
time.sleep(1)


What I'm asking for is: How do I delete the cookie? Or I don't need to delete it because BeautifulSoup will retry the request using a different instance?

Also, is it possible to set things like headers, user agent etc with this method? If so, how?

Answer

You do not need to delete the cookie here. Every time you request the url with the urlopen() method it will request a new copy of the page. However, if you do need to save a cookie, I recommend the python requests library or mechanize which will both allow you to save browser sessions.

It is also possible to add headers and a user agent to the code you have.

import urllib2
from BeautifulSoup import BeautifulSoup
import time

url = 'http://google.com'

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

while True:
    page = opener.open(url).read()
    soup = BeautifulSoup(page)
    soup.prettify()
    if soup.find(id='hello'):
        print "Found!"
        break
    else:
        time.sleep(1)
Comments