AndrewF AndrewF - 14 days ago 7
Python Question

Unknown bug in csv to dict and list conversion parser

I've loaded in a csv file into my program using the DictReader function. My task is to append each column of the csv file into a seperate list labelled "filename" plus the count. Once converted into a dict I use this piece of code, but it only appends the first temporary list to my final dict. Could someone please point out the issue here?

import csv
with open(cities_new, 'r') as g:
files = csv.DictReader(g)
filenames = ['name', "timeZone_label", "utcOffset", "homepage",
"governmentType_label",
"isPartOf_label", "areaCode", "populationTotal", "elevation",
"maximumElevation", "minimumElevation", "populationDensity",
"wgs84_pos#lat", "wgs84_pos#long", "areaLand", "areaMetro",
"areaUrban"]

dict_1 = {}
count_2 = 0
for name in filenams:
lst = []
for row in files:
lst.append(row[name])
count_2+=1
dict_1['filename'+str(count_2)] = lst


Here's my output:

{'filename1': ['Indian Standard Time',
'Indian Standard Time',
'Indian Standard Time',
'Indian Standard Time',
'Indian Standard Time',
'Indian Standard Time',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Alaska Time Zone',
'Alaska Time Zone',
'Pacific Time Zone',
'Alaska Time Zone',
'Alaska Time Zone',
'Alaska Time Zone',
'Alaska Time Zone',
'Alaska Time Zone',
'Pacific Time Zone',
'Pacific Time Zone',
'Pacific Time Zone',
'Pacific Time Zone',
'Pacific Time Zone',
'Eastern Time Zone',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Central Time Zone (North America)',
'Pacific Time Zone',
'Time in China',
'NULL',
'Central Time Zone (North America)',
'Central Time Zone (North America)'],
'filename10': [],
'filename11': [],
'filename12': [],
'filename13': [],
'filename14': [],
'filename15': [],
'filename16': [],
'filename2': [],
'filename3': [],
'filename4': [],
'filename5': [],
'filename6': [],
'filename7': [],
'filename8': [],
'filename9': []}


Any help would be much appreciated thanks.
Here's a line by line plain text copy of the csv file. I've cut it down to five lines but it's still lengthy.

"URI","rdf-schema#label","rdf-schema#comment","administrativeDistrict_label","administrativeDistrict","anthem_label","anthem","area","areaCode","areaLand","areaMetro","areaRural","areaTotal","areaUrban","areaWater","city_label","city","code","country_label","country","daylightSavingTimeZone_label","daylightSavingTimeZone","district_label","district","division_label","division","elevation","federalState_label","federalState","foundingDate","foundingPerson_label","foundingPerson","foundingYear","governingBody_label","governingBody","government_label","government","governmentType_label","governmentType","isPartOf_label","isPartOf","isoCodeRegion_label","isoCodeRegion","leader_label","leader","leaderName_label","leaderName","leaderParty_label","leaderParty","leaderTitle","location_label","location","maximumElevation","mayor_label","mayor","minimumElevation","motto","municipality_label","municipality","part_label","part","percentageOfAreaWater","populationAsOf","populationDensity","populationMetro","populationMetroDensity","populationRural","populationTotal","populationTotalRanking","populationUrban","populationUrbanDensity","postalCode","region_label","region","state_label","state","synonym","thumbnail_label","thumbnail","timeZone_label","timeZone","twinCity_label","twinCity","twinCountry_label","twinCountry","type_label","type","utcOffset","point","22-rdf-syntax-ns#type_label","22-rdf-syntax-ns#type","wgs84_pos#lat","wgs84_pos#long","depiction_label","depiction","homepage_label","homepage","name","nick"
"http://dbpedia.org/resource/Kud","Kud","Kud is a town and a notified area committee in Udhampur District in the Indian state of Jammu and Kashmir.","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","India","http://dbpedia.org/resource/India","NULL","NULL","NULL","NULL","NULL","NULL","1855.0","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","{Jammu and Kashmir|Udhampur district}","{http://dbpedia.org/resource/Jammu_and_Kashmir|http://dbpedia.org/resource/Udhampur_district}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","1140","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","Indian Standard Time","http://dbpedia.org/resource/Indian_Standard_Time","NULL","NULL","NULL","NULL","NULL","NULL","+5:30","33.08 75.28","{city|place|populated place|municipality|City|Place|_Feature|owl#Thing}","{http://dbpedia.org/ontology/City|http://dbpedia.org/ontology/Place|http://dbpedia.org/ontology/PopulatedPlace|http://dbpedia.org/ontology/Settlement|http://schema.org/City|http://schema.org/Place|http://www.opengis.net/gml/_Feature|http://www.w3.org/2002/07/owl#Thing}","33.08","75.28","NULL","NULL","NULL","NULL","Kud","NULL"

"http://dbpedia.org/resource/Kuju,_Hazaribag","Kuju Hazaribag","Kuju is a census town in Ramgarh district in the Indian state of Jharkhand.","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","India","http://dbpedia.org/resource/India","NULL","NULL","NULL","NULL","NULL","NULL","426.0","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","{Jharkhand|Ramgarh district}","{http://dbpedia.org/resource/Jharkhand|http://dbpedia.org/resource/Ramgarh_district}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","18049","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","Indian Standard Time","http://dbpedia.org/resource/Indian_Standard_Time","NULL","NULL","NULL","NULL","NULL","NULL","+5:30","23.72 85.5","{city|place|populated place|municipality|City|Place|_Feature|owl#Thing}","{http://dbpedia.org/ontology/City|http://dbpedia.org/ontology/Place|http://dbpedia.org/ontology/PopulatedPlace|http://dbpedia.org/ontology/Settlement|http://schema.org/City|http://schema.org/Place|http://www.opengis.net/gml/_Feature|http://www.w3.org/2002/07/owl#Thing}","23.72","85.5","NULL","NULL","NULL","NULL","Kuju","NULL"

Answer

You are repeatedly trying to read from a file, but once a file has reached the end you need to explicitly 'rewind' the file to the start again to be able to read the same data again.

You could do this with g.seek(0), but re-reading the whole file is rather inefficient.

Invert your loop instead, reading once:

dict_1 = {}
for row in files:
    for count, name in enumerate(filenames):
        key = 'filename{}'.format(count)
        value = row[name]
        dict_1.setdefault(key, []).append(value)

I replaced your manual count_2 incrementing with the enumerate() function.