hr malik hr malik - 1 year ago 83
Python Question

i want to scrape data using python script

I have written python script to scrape data from
It is a list of 100 players and I successfully scraped this data. The problem is, when i run script instead of scraping data just one time it scraped the same data 3 times.

<div class="cb-col cb-col-100 cb-font-14 cb-lst-itm text-center">
<div class="cb-col cb-col-16 cb-rank-tbl cb-font-16">1</div>
<div class="cb-col cb-col-50 cb-lst-itm-sm text-left">
<div class="cb-col cb-col-33">
<div class="cb-col cb-col-50">
<span class=" cb-ico" style="position:absolute;"></span>&nbsp;&nbsp;&nbsp;&nbsp;–
<div class="cb-col cb-col-50">
<img src="" class="img-responsive cb-rank-plyr-img">
<div class="cb-col cb-col-67 cb-rank-plyr">
<a class="text-hvr-underline text-bold cb-font-16" href="/profiles/2250/steven-smith" title="Steven Smith's Profile">Steven Smith</a>
<div class="cb-font-12 text-gray">AUSTRALIA</div>
<div class="cb-col cb-col-17 cb-rank-tbl">906</div>
<div class="cb-col cb-col-17 cb-rank-tbl">1</div>

And here is python script which i write scrap each player data.

import sys,requests,csv,io
from bs4 import BeautifulSoup
from urllib.parse import urljoin

url = ""
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

maindiv = soup.find_all("div", {"class": "text-center"})
for div in maindiv:

but instead of scraping the data once, it scrapes the same data 3 times.

Where can I make changes to get data just one time?

Answer Source

Select the table and look for the divs in that:

maindiv ="#batsmen-tests div.text-center")
for div in maindiv:

Your original output and that above gets all the text from the divs as one line which is not really useful, if you just want the player names:

anchors ="#batsmen-tests div.cb-rank-plyr a")
for a in anchors:

A quick and easy way to get the data in a nice csv format is to just get text from each child:

maindiv ="#batsmen-tests div.text-center")
for d in maindiv[1:]:
    row_data = u",".join(s.strip() for s in filter(None, (t.find(text=True, recursive=False) for t in d.find_all())))
    if row_data:

Now you get output like:

# rank, up/down, name, country, rating, best rank
1,–,Steven Smith,AUSTRALIA,906,1
2,–,Joe Root,ENGLAND,878,1
3,–,Kane Williamson,NEW ZEALAND,876,1
4,–,Hashim Amla,SOUTH AFRICA,847,1
5,–,Younis Khan,PAKISTAN,845,1
6,–,Adam Voges,AUSTRALIA,802,5
7,–,AB de Villiers,SOUTH AFRICA,802,1
8,–,Ajinkya Rahane,INDIA,785,8
9,2,David Warner,AUSTRALIA,772,3
10,–,Alastair Cook,ENGLAND,770,2

As opposed to:

PositionPlayerRatingBest Rank
1    –Steven SmithAUSTRALIA9061
2    –Joe RootENGLAND8781
3    –Kane WilliamsonNEW ZEALAND8761
4    –Hashim AmlaSOUTH AFRICA8471
5    –Younis KhanPAKISTAN8451
6    –Adam VogesAUSTRALIA8025
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download