PythonLearner PythonLearner - 2 days ago 5
Python Question

Python: Match values between two csv files

I am parsing through two different csv files and need to match a column between them. Currently, when I run the snippet, it returns no matching values when, in reality, there are matching Addresses between the two csv files. The problem I am having is abbreviations with the addresses field in the OnlineData csv file. For example:

In the Addresses csv In the OnlineData csv
4587 Newton Road 4587 Newton Rd
7854 Food Court 7854 Food Ct


How can I tell Python to look up only the numbers ('4587') and the first word ('Newton') in both the csv files when looking for matching values.

import csv


Addresses = set()

with open ('Addresses.csv') as f:
for row in csv.reader(f):
Addresses.add(row[1])

OnlineData = set()

with open ('C:/Users/OnlineData.csv') as g:
for row in csv.reader(g):
PermitData.add(row[1])


results = Addresses & OnlineData


print 'There are', len(results), 'matching addresses between the two csv files'

for result in sorted(results):
print result

sal sal
Answer

Since you are only interested in matching portions of the data, you might as well just load that portion into the set and then perform the intersection.

import csv

Addresses = set()
with open ('Addresses.csv') as f:
    for row in csv.reader(f):
        portion = ' '.join(row[1].split()[:-1])  # Loads "4587 Newton" instead of "4587 Newton Road"
        Addresses.add(portion)

OnlineData = set()
with open ('C:/Users/OnlineData.csv') as g:
    for row in csv.reader(g):
        portion = ' '.join(row[1].split()[:-1])
        OnlineData.add(portion)

results = Addresses & OnlineData

print 'There are', len(results), 'matching addresses between the two csv files'

for result in sorted(results):
    print result

The obvious disadvantage is that you lose that bit of information, which you could still retrieve. Another option would be to normalize the input, meaning that you could replace Rd with Road and Ct with Court wherever those appear, so as to have always matching info.

Comments