GeekPro101 GeekPro101 - 3 months ago 34
Python Question

Python counts duplicates as uniques in csv file

I've written a script that takes the html table from the executed offenders in Texas (can't post link due to restrictions but can be found in code for getcsv.py) and converts it into a csv file. Another script then counts up the races of each person. However, I've been having an issue where it counts all but one of both white and hispanic, then counts it separately. This:

[('White', 237), ('Black', 196), ('Hispanic', 100), ('Other', 2), ('White ', 1), ('Hispanic ', 1)]
is the result.

This is the script that downloads the csv file (getcsv.py)

import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen

soup = BeautifulSoup(urlopen('http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html'), "html.parser")
table = soup.find('table')
headers = [header.text for header in table.find_all('th')]
rows = []
for row in table.find_all('tr'):
rows.append([val.text for val in row.find_all('td')])

with open('new.csv', 'w', encoding="utf8", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(row for row in rows if row)


This is the script the takes the races (analyse.py)

import csv
import collections


race = collections.Counter()



with open('new.csv') as input_file:
next(input_file)
for row in csv.reader(input_file, delimiter=','):
race[row[8]] += 1

list(race)
racecom = race.most_common()


print ('Number of white people executed: %s' % grades['White'])
print ('Number of black people executed: %s' % grades['Black'])
print ('Number of Hispanic people executed: %s' % grades['Hispanic'])
print ('Number of Other people executed: %s' % grades['Other'])
print (racecom)


However when I use a csv file generated by convertcsv.org the problem disappears, so I am fairly sure it's getcsv.py that has the fault.

The generated file can be downloaded at https://www.dropbox.com/s/gz0kob2miejqucq/actual.csv?dl=0 as actual.csv and the auto downloaded one can be found at https://www.dropbox.com/s/chkycm21konvcw0/new.csv?dl=0 as new.csv.

Thanks in advance.

Answer

Whitespaces are important. You have to strip them away, if the keys should be the same:

with open('new.csv') as input_file:
    next(input_file)
    race = collections.Counter(row[8].strip()
        for row in csv.reader(input_file, delimiter=','))