DarKsi DarKsi - 1 month ago 7
Python Question

Python - converting to list

import requests
from bs4 import BeautifulSoup

webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
for story_heading in soup.find_all(class_="story-heading"):
articles = story_heading.text.replace('\n', '').replace(' ', '')
print (articles)


There is my code, it prints out a list of all the article titles on the website. I get strings:


Looking Back: 1980 | Funny, but Not Fit to Print

Brooklyn Studio With Room for Family and a Dog

Search for Homes for Sale or Rent

Sell Your Home


So, I want to convert this to a list = ['Search for Homes for Sale or Rent', 'Sell Your Home', ...], witch will allow me to make some other manipulations like random.choice etc.

I tried:

alist = articles.split("\n")
print (alist)



['Looking Back: 1980 | Funny, but Not Fit to Print']

['Brooklyn Studio With Room for Family and a Dog']

['Search for Homes for Sale or Rent']

['Sell Your Home']


It is not a list that I need. I'm stuck. Can you please help me with this part of code.

Answer

You are constantly overwriting articles with the next value in your list. What you want to do instead is make articles a list, and just append in each iteration:

import requests from bs4 import BeautifulSoup

webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
articles = []
for story_heading in soup.find_all(class_="story-heading"): 
    articles.append(story_heading.text.replace('\n', '').replace('  ', ''))
print (articles)

The output is huge, so this is a small sample of what it looks like:

['Global Deal Reached to Curb Chemical That Warms Planet', 'Accord Could Push A/C Out of Sweltering India’s Reach ',....] 

Furthermore, you only need to strip spaces in each iteration. You don't need to do those replacements. So, you can do this with your story_heading.text instead:

articles.append(story_heading.text.strip())

Which, can now give you a final solution looking like this:

import requests
from bs4 import BeautifulSoup 

webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
articles = [story_heading.text.strip() for story_heading in soup.find_all(class_="story-heading")]
print (articles)
Comments