Perlinn Perlinn - 1 year ago 55
Python Question

Extract various information from CSV file to another CSV file


would like to extract various information like name, date and address from a 2 column csv file before writing to another csv file


  1. Extract Name by first row as it will always be the first

  2. Extract Date by regex (is there regex in python?) ##/##/####

  3. Extract Address by the constant keyword 'road'

Example CSV dummy Source data reference file format viewed from EXCEL

88888,new issac road
99999,samsung road

Desired CSV outcome

8888,DADDY,new issac road,2/06/2016
9999,MUMMY,samsung road,12/02/2016

What i have so far:

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('dummy_data.csv') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
uniqueidstatement = columns['receipt_id']

print uniqueidstatement

resultFile = open("wtf.csv",'wb')
wr = csv.writer(resultFile, dialect='excel')

Answer Source

You can group the sections by ID and from each group you can determine which is the date and which is the address with some simple logic .

import csv
from itertools import groupby
from operator import itemgetter

with open("test.csv") as f, open("out.csv", "w") as out:
    reader = csv.reader(f)
    writer = csv.writer(out)
    writer.writerow(["ID","NAME","ADDRESS", "DATE"])
    groups = groupby(csv.reader(f), key=itemgetter(0))
    for k, v in groups:
        id_, name = next(v)
        add_date_1, add_date_2 = next(v)[1], next(v)[1]
        date, add = (add_date_1, add_date_2) if "road" in add_date_2 else  (add_date_2, add_date_1)
        writer.writerow([id_, name, add, date])