Kishan Jangam Kishan Jangam - 7 months ago 6
Python Question

Python lists and webminig

from bs4 import BeautifulSoup
import urllib2


#opening Nytimes and reading the page

response = urllib2.urlopen('http://www.nytimes.com').read()
soup=BeautifulSoup(response)

data = []


#I am taking all the headings on the homepage and taking them in to a list

for story_heading in soup.find_all(class_="story-heading"):
story_title = story_heading.text.replace("\n", "").strip()
new_story_title = story_title.encode('utf-8')


#im converting the words of each title into a list

words = new_story_title.split()
data.append(words)
print data


Now, I want to remove the numbers in this text how can i do it?

Answer

try this code

clean_text = ''.join([i for i in data if not i.isdigit()])

Source: HERE

words = ''.join([i for i in new_story_title if not i.isdigit()]).split()
data.append(words)
print data

Try the code above

Comments