Alex R. Alex R. - 1 month ago 15
Python Question

Setting up a login with python requests for indeed.com

I'm trying to write a resume searcher for www.indeed.com (there's no API for resumes unfortunately). Specifically, I need to provide login details (to get names from resumes). The login page is here:

https://secure.indeed.com/account/login

I was following a guide here: https://kazuar.github.io/scraping-tutorial/

My code so far is:

import requests
from lxml import html
session_requests = requests.session()

login_url = "https://secure.indeed.com/account/login"
result = session_requests.get(login_url)

tree = html.fromstring(result.text)

payload={
'_email': 'my@email.com',
'_password': 'mypassword'
}

result = session_requests.post(
login_url,
data = payload,
headers = dict(referer=login_url)
)


This doesn't seem to work quite right. First off, I think I'm missing some authentication tokens. After inspecting the login page, I think it might be the "surftok" attribute, but I'm not completely sure. Is this even possible just with the requests module, or will I need Selenium or mechanize to make this work?

Answer

You're missing multiple data fields.

This worked for me

import requests
data = {
        'action':'Login',
        '__email':'Your Email',
        '__password':'Your password',
        'remember':'1',
        'hl':'en',
        'continue':'/account/view?hl=en',
       }

response = requests.post('https://secure.indeed.com/account/login',data=data)
response[200]