Lewis Lewis - 1 month ago 12
Python Question

Dictionary saving last result to every value using BeautifulSoup

I am currently in the process of making a web crawler using

requests
and
BeautifulSoup
. I am using a for loop to create a list of dictionaries with the values being the
href
of the
a
tags. I am having issues doing this however since all of the results will be the last
href
on that page. Here is the output when I print out the final result:

[{'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}]


I am unsure as to why it's doing the last value only. I assume it's because through the last loop, it assigns all keys with the same name to that value. How can I go around fixing this? Here is the code.

import json
import requests
from bs4 import BeautifulSoup

tags_dict = {}
tags_list = []

r = requests.get("http://chicosadventures.com/")

soup = BeautifulSoup(r.content, "lxml")


for link in soup.find_all('a'):
tags_dict['link'] = link.get('href')
tags_list.append(tags_dict)

dump = json.dumps(tags_list)
print(dump)

Answer

Your issue is with tags_dict. You are just storing a reference to that one dictionary again and again in your list, and since its a reference, the last value gets reflected in all entries. I changed it to create a new dict object for each iteration, now it works fine

import json
import requests
from bs4 import BeautifulSoup

tags_list = []
r = requests.get("http://chicosadventures.com/")
soup = BeautifulSoup(r.content, "lxml")

for link in soup.find_all('a'):
    tags_list.append({"link": link.get('href')})

dump = json.dumps(tags_list)
print(dump)

Output:

[{"link": "/"}, {"link": "/about_chico"}, {"link": "/about_the_author"}, {"link": "/about_the_illustrator"}, {"link": "/chico_in_the_news_"}, {"link": "/order_your_copy"}, {"link": "/contact_us"}, {"link": "/about_chico"}, {"link": "/about_the_author"}, {"link": "/about_the_illustrator"}, {"link": "/chico_in_the_news_"}, {"link": "/order_your_copy"}, {"link": "/contact_us"}, {"link": "/privacy"}, {"link": "javascript:print()"}, {"link": "http://www.ebtech.net/"}, {"link": "/terms"}]