James Dean James Dean - 27 days ago 10
Python Question

Python: script is not writing the links from variable

My script below...

I feel like I'm missing one line of code to make this work properly. Using Reddit as a test source to scrap sport links.

# import libraries
import bs4
from urllib2 import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.reddit.com/r/BoxingStreams/comments/6w2vdu/mayweather_vs_mcgregor_archive_footage/'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()

# html parsing
page_soup = soup(page_html, "html.parser")

hyperli = page_soup.findAll("form")

filename = "sportstreams.csv"
f = open(filename, "w")

headers = "Sport Links"


for containli in hyperli:
link = containli.a["href"]




Everything works except that it only grabs the link from the first row [0]. If I don't use the code
then it adds all the (a href links) except that it also adds the word NONE to the CSV file. Using the
would (I hope) just add the http links and avoid adding the word NONE.

What am I missing here?

Answer Source

As explained in the documentation Navigating using tag names:

Using a tag name as an attribute will give you only the first tag by that name
If you need to get all the <a> tags, or anything more complicated than the first tag with a certain name, you’ll need to use one of the methods described in Searching the tree, such as find_all():

In your case, you could use page_soup.select("form a[href]") to find all the links in forms that have href attributes.

links = page_soup.select("form a[href]")
for link in links:
    href = link["href"]
    f.write(href + "\n")