squidvision squidvision - 1 year ago 175
Python Question

Scraping Indeed with Beautiful Soup

I'm unfamiliar with html and web scraping with beautiful soup. I'm trying to retrieve Job titles, salaries, location and company name from various indeed job postings. This is my code so far:

URL = "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10"
import urllib2
import bs4
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen(URL).read())
resultcol = soup.find_all(id = 'resultsCol')
company = soup.findAll('span', attrs={"class":"company"})
jobs = (soup.find_all({'class': " row result"}))

though I have the commands to find jobs and company, I can't get the contents. I'm aware there's a contents command, but none of my variables so far have that attribute. Thanks!

Answer Source

First I seach div with one job all elements and then I search elements inside this div

import urllib2
from bs4 import BeautifulSoup

URL = "http://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=10"

soup = BeautifulSoup(urllib2.urlopen(URL).read(), 'html.parser')

results = soup.find_all('div', attrs={'data-tn-component': 'organicJob'})

for x in results:
    company = x.find('span', attrs={"itemprop":"name"})
    print 'company:', company.text.strip()

    job = x.find('a', attrs={'data-tn-element': "jobTitle"})
    print 'job:', job.text.strip()

    salary = x.find('nobr')
    if salary:
        print 'salary:', salary.text.strip()

    print '----------'
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download