Dance Party2 Dance Party2 - 7 months ago 14
Python Question

Python Requests Go to Link and Download

I want to do the following in an automated fashion:


  1. Go to this link: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County-DL.xml

  2. Follow the link at the very bottom of the page (ending with the current year and month (i.e. http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County-Items/Monthly-Enrollment-by-CPSC-2016-04.html)

  3. At the next page, download the zip file from the top link under "downloads":
    Monthly Enrollment by CPSC - April 2016 [ZIP, 20MB]



So far, I have the following to get the current year and month, but I need help with the rest...

from datetime import datetime
import calendar
Day = datetime.now().day
Month = datetime.now().month
Year = datetime.now().year
m=calendar.month_name[Month]

Answer

You would need an XML parser to extract the link from the XML feed and the HTML parser to extract the link to the zip file. For that, here we'll use lxml.etree and lxml.html respectively. Working implementation:

from datetime import datetime
from urllib.request import urlretrieve
from urllib.parse import urljoin

import requests
from lxml import etree
from lxml import html


date_part = datetime.now().strftime("%Y-%m")
with requests.Session() as session:
    # get the XML feed and extract the link
    response = session.get("https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County-DL.xml")
    root = etree.fromstring(response.content)
    link = root.xpath("//item/link[contains(., '-%s.html')]/text()" % date_part)[0]

    # follow the link and extract the link to the zip file
    response = session.get(link)
    root = html.fromstring(response.content)
    zip_link = root.xpath("//a[@type='application/zip']/@href")[0]
    link = urljoin(link, zip_link)

    # download zip
    urlretrieve(link, filename="my.zip")