ike ike - 2 months ago 14
Python Question

beautiful soup vs espn

I'm working on scraping the espn nhl stats using beautifulsoup, trying to create something like


PLAYER, TEAM, GP, G, A, PTS, +/-, PIM, PTS/G, SOG, PCT, GWG, G, A, G, A,

Patrick Kane, RW, CHI, 82, 46, 60, 106, 17, 30, 1.29, 287, 16.0, 9, 17, 20, 0, 0

Jamie Benn, LW, DAL, 82, 41, 48, 89, 7, 64, 1.09, 247, 16.6, 5, 17, 13 2 3

Sidney Crosby, C, PIT, 80, 36, 49, 85, 19, 42, 1.06, 248, 14.5, 9, 10, 14, 0, 0


Thus far I've gotten something that loops through and pulls in all the data but it's all one column without the commas and headers

import urllib2
from bs4 import BeautifulSoup
url = "http://www.espn.com/nhl/statistics/player/_/stat/points"
page = urllib2.urlopen(url)

f = open('nhlstarter.txt', 'w')

soup=BeautifulSoup(page, "html.parser")

for tr in soup.select("#my-players-table tr[class*=player]"):
for ob in range(1,15):
player_info = tr('td')[ob].get_text(strip=True)
print(player_info)
f.write(player_info + '\n')

f.close()


This gets

Patrick Kane, RW
CHI
82
46
60
106
17
30
1.29
287
16.0
9
17
20


etc

how do I convert the columnar data into usable rows? I thought I might be able to do something like the following:

for tr in soup.select("#my-players-table tr[class*=player]"):
for ob in range(1,15):
player_info + str(ob) = tr('td')[ob].get_text(strip=True)
print(player_info + str(ob))
f.write(player_info + str(ob) "," + player_info + str(ob) '\n')


but that failed miserably as it didn't properly increase the variables by loop

any advice on how to either grab all columns of the table at once or loop through to get an usable csv would be greatly appreciated.

thanks for any help

Answer

You could append the player information into a list initially to represent the row and then join the list into a string as you write it to the file:

for tr in soup.select("#my-players-table tr[class*=player]"):

    row = []

    for ob in range(1,15):

        ## -- Assuming player_info has the column data
        player_info = tr('td')[ob].get_text(strip=True)

        row.append(player_info)

    f.write(",".join(row) + "\n")