I am completely new to python and studying Web crawling.
I am trying to download individual target link in text pages.
So far, I succeeded to extract all the target URLs I need, but have no idea on how to download all target HTML texts in multiple files. The code below just shows same article in multiple files.
Can someone help me please.
url = ""
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
link1 = soup2.find_all('a', href=re.compile("drupal_lists"))
for t1 in link1:
link_data = requests.get(t.attrs['href']).text
for i in link_data:
with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f:
In the style of your code, starting from the point when things change:
for i, t1 in enumerate(link1): # Get indices and data in one go link_data = requests.get(t1.attrs['href']).text with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f: f.write(link_data) # no str(i) because that would mess with the HTML