Perlinn Perlinn - 2 months ago 15
Python Question

Extract various information from CSV file to another CSV file

Overview

would like to extract various information like name, date and address from a 2 column csv file before writing to another csv file

Conditions


  1. Extract Name by first row as it will always be the first
    row.

  2. Extract Date by regex (is there regex in python?) ##/##/####
    format

  3. Extract Address by the constant keyword 'road'






Example CSV dummy Source data reference file format viewed from EXCEL




ID,DATA
88888,DADDY
88888,2/06/2016
88888,new issac road
99999,MUMMY
99999,samsung road
99999,12/02/2016


Desired CSV outcome

ID,Name,Address,DATE
8888,DADDY,new issac road,2/06/2016
9999,MUMMY,samsung road,12/02/2016


What i have so far:

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('dummy_data.csv') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
uniqueidstatement = columns['receipt_id']

print uniqueidstatement

resultFile = open("wtf.csv",'wb')
wr = csv.writer(resultFile, dialect='excel')
wr.writerow(uniqueidstatement)

Answer

You can group the sections by ID and from each group you can determine which is the date and which is the address with some simple logic .

import csv
from itertools import groupby
from operator import itemgetter

with open("test.csv") as f, open("out.csv", "w") as out:
    reader = csv.reader(f)
    next(reader)
    writer = csv.writer(out)
    writer.writerow(["ID","NAME","ADDRESS", "DATE"])
    groups = groupby(csv.reader(f), key=itemgetter(0))
    for k, v in groups:
        id_, name = next(v)
        add_date_1, add_date_2 = next(v)[1], next(v)[1]
        date, add = (add_date_1, add_date_2) if "road" in add_date_2 else  (add_date_2, add_date_1)
        writer.writerow([id_, name, add, date])
Comments