Simon Breton Simon Breton - 3 days ago 5
Python Question

Why only a part of my list are write in a csv file?

I'm new to python trying to build my first script. I want to scrap a list of url and export it in a csv file.

My script is well executed however when opening the csv file only few line of data are written. When I'm printing the list I'm trying to write (

sharelist
and
sharelist1
), the print is complete whereas the csv file is not.

Here is a part of my code :

for url in urllist[10:1000]:
# query the website and return the html to the variable 'page'
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # eheck the return code
continue
soup = BeautifulSoup(page, 'html.parser')

# Take out the <div> of name and get its value
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing

# save the data in tuple
sharelist.append(url)
sharelist1.append(share)

# open a file for writing.
csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
for row in zip(sharelist, sharelist1):
mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()


Not sure which part of my code I should share here. Please tell me if it's not enough !

Answer

The problem is that you are opening the file every time you run through the loop. This essentially will overwrite the previous file.

# open a file for writing.
    csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
    mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
    csv_out.close()

Either open the file before the loop, or open it with the append option.

This is option one (note the indentation):

# open a file for writing.
csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
mywriter = csv.writer(csv_out)
for url in urllist[10:1000]:  
    try:
        page = urllib2.urlopen(url)
    except urllib2.HTTPError as e:
            if e.getcode() == 404: # eheck the return code
                continue
    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find(attrs={'class': 'nb-shares'})
    if name_box is None:
      continue
    share = name_box.text.strip()

    # save the data in tuple
    sharelist.append(url)
    sharelist1.append(share)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()

This is option 2:

for url in urllist[10:1000]:  
            # query the website and return the html to the variable 'page'
    try:
        page = urllib2.urlopen(url)
    except urllib2.HTTPError as e:
            if e.getcode() == 404: # eheck the return code
                continue
    soup = BeautifulSoup(page, 'html.parser')

                # Take out the <div> of name and get its value
    name_box = soup.find(attrs={'class': 'nb-shares'})
    if name_box is None:
      continue
    share = name_box.text.strip() # strip() is used to remove starting and trailing

    # save the data in tuple
    sharelist.append(url)
    sharelist1.append(share)

# open a file for writing.
    csv_out = open('mycsv.csv', 'ab')

# create the csv writer object.
    mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
    csv_out.close()
Comments