E M E M -4 years ago 100
Python Question

Python: reading the same rows from a csv file - logic

I have a problem with appending data for missing rows in the csv file: I am reading rows from a csv file for each customer and appending lists with the data the rows have. Each customer needs to have the same id's that are highlighted in green in the example image. If the next customer doesn't have the rows with all needed id's, I still need to append 0 values to the lists for these missing rows. So the customer highlighted in yellow needs to have same number of values appended to the data lists as the one in green.

I am trying to read each row and compare its id with the list of all possible id's that I created, but I am always stuck on the first id and not sure if this is the right way to go and read the previous row again until it's id is equal to the id from the list for possible id's (I do this to add the missing row's data to the list). Please let me know if you have any suggestions?

Note: if take into consideration only the column with id's, for these two customers I would like the list to look like this:

list_with_ids = [410, 409, 408, 407, 406, 405, 403, 402, **410, 409, 408, 407, 406, 405, 403, 402**]
. So I am looking for a way - once I am on row 409 in yellow - to first append the first needed id 410, and only then 409 and so forth. And same - append the two missing ids at the end: 403, 402.

Code:
def write_data(workbook):
[...]

# Lists.
list_cust = []
list_quantity = [] # from Some_data columns

# Get the start row in the csv file.
for row in range(worksheet.nrows):
base_id = str(410)
value = worksheet.cell(row, 1).value
start = str(value)
if base_id [0] == start[0]:
num_of_row_for_id = row

# Append the first id.
first_cust = str(worksheet.cell(num_of_row_for_id, 0).value)
list_cust.append(first_cust)

# Needed to count id's.
count = 0

# List with all needed id's for each customer.
# instead of ... - all ids' in green from the image.
all_ids = [....]

# Get data.
for row in range(worksheet.nrows):
next_id = str(worksheet.cell(num_of_row_for_id, 1).value)
cust = str(worksheet.cell(num_of_row_for_id, 0).value)

# Append id to the list.
list_cust.append(cust)


# Needed to separate rows for each customer.
if list_cust[len(list_cust)-1] == list_cust[len(list_cust)-2]:

# Get data: I read columns to get data.
# Let's say I read col 4 to 21.
for col_num in range(3, 20):

# Here is the prolem: ############################
if next_id != all_ids[count]:
list_quantity.append(0)

if next_id == all_ids[count]:
qty = worksheet.cell(num_of_row_for_id, col_num).value
list_quantity.append(qty)

# Get the next row in reverse order.
num_of_row_for_id -= 1

# Increment count for id's index.
if list_cust[len(list_cust)-1] == list_cust[len(list_cust)-2]:
# 8 possible id's.
if count < 7:
count += 1
else:
count = 0

Answer Source

Consider the following data wrangling with list comprehensions and loops using following data input containing random data columns:

Input Data

# Cust  ID      Data1        Data2  Data3        Data4  Data5
# 2011  62,404  0.269101238  KPT    0.438881697  UAX    0.963170513
# 2011  62,405  0.142397746  XYD    0.51668728   PTQ    0.761695425
# 2011  62,406  0.782342616  QCN    0.259141256  FNX    0.870971924
# 2011  62,407  0.221750017  EIU    0.358439487  MAN    0.13633062
# 2011  62,408  0.097509568  CRU    0.410058705  BFK    0.680228327
# 2011  62,409  0.322871333  LAC    0.489425167  GUX    0.449476844
# 919   62,403  0.371461633  PUR    0.626146074  KWX    0.525711736
# 919   62,404  0.384859932  AJZ    0.223408599  JSU    0.914916663
# 919   62,405  0.020630503  SFY    0.260778598  VUU    0.213559498
# 919   62,406  0.952425138  EBI    0.59595738   ZYU    0.283794413
# 919   62,407  0.410368534  BTT    0.252698401  FFY    0.41080646
# 919   62,408  0.553390336  GMA    0.846309022  BIN    0.049852419
# 919   62,409  0.193437955  NBB    0.877311494  XQX    0.080656637

Python code

import csv

i = 0
data = []
# READ CSV AND CAPTURE HEADERS AND DATA
with open('Input.csv', 'r') as f:       
    rdr = csv.reader(f)    
    for line in rdr:
        if i == 0:
            headers = line
        else:
            line[1] = int(line[1].replace(',',''))
            data.append(line)
        i += 1

# CREATE NEEDED LISTS
cust_list = list(set([i[0] for i in data]))
id_list = [62402,62403,62404,62405,62406,62407,62408,62409,62410]

# CAPTURE MISSING IDS BY CUSTOMER
for c in cust_list:
    currlist = [d[1] for d in data if d[0] == c]
    missingids = [i for i in id_list if i not in currlist]
    for m in missingids:
        data.append([c, m,'','','','',''])

# WRITE DATA TO NEW CSV IN SORTED ORDER
with open('Output.csv', 'w') as f:
    wtr = csv.writer(f, lineterminator='\n')
    wtr.writerow(headers)
    for c in cust_list:
        for i in sorted(id_list, reverse=True):
            for d in data:  
                if d[0] == c and d[1] == i:
                    wtr.writerow(d)

Output Data

Output Data

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download