Jaeho Shin Jaeho Shin - 25 days ago 7
Python Question

Downloading target link html to text files

I am completely new to python and studying Web crawling.

I am trying to download individual target link in text pages.
So far, I succeeded to extract all the target URLs I need, but have no idea on how to download all target HTML texts in multiple files. The code below just shows same article in multiple files.

Can someone help me please.

url = ""
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
link1 = soup2.find_all('a', href=re.compile("drupal_lists"))

for t1 in link1:
print(t1.attrs['href'])
link_data = requests.get(t.attrs['href']).text

import io
for i in link_data:
link_data
with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f:
f.write(str(i)+link_data)

Answer

In the style of your code, starting from the point when things change:

for i, t1 in enumerate(link1):  # Get indices and data in one go
   link_data = requests.get(t1.attrs['href']).text
   with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f:
       f.write(link_data)  # no str(i) because that would mess with the HTML
Comments