Jaeho Shin Jaeho Shin - 1 year ago 78
Python Question

Downloading target link html to text files

I am completely new to python and studying Web crawling.

I am trying to download individual target link in text pages.
So far, I succeeded to extract all the target URLs I need, but have no idea on how to download all target HTML texts in multiple files. The code below just shows same article in multiple files.

Can someone help me please.

url = ""
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
link1 = soup2.find_all('a', href=re.compile("drupal_lists"))

for t1 in link1:
link_data = requests.get(t.attrs['href']).text

import io
for i in link_data:
with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f:

Answer Source

In the style of your code, starting from the point when things change:

for i, t1 in enumerate(link1):  # Get indices and data in one go
   link_data = requests.get(t1.attrs['href']).text
   with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f:
       f.write(link_data)  # no str(i) because that would mess with the HTML
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download