textnet textnet - 1 month ago 7
Python Question

Transforming a csv to a list of co-occurrence pairs in Python

My data is a csv that looks like:

1 abc
1 def
2 ghi
3 jkl
3 mno
3 pqr


I want to transform it into a list of all pairs that co-occur with the same number in column 1. Like this:

abc; def
jkl; mno
mno; pqr

Answer

First, your input csv file is not really a csv. It's more a file that can be parsed using str.split. Well.

Now, I'll get the tokens and use itertools.groupby using first column as key to group items with same first column.

Once you have that, filter out the lists with one 1 item, and apply a combination on the rest.

Write as a proper csv file:

import csv, itertools

    with open("test.csv") as f:
        with open("output.csv","w",newline="") as f2:
        # with open("output.csv","wb") as f2:   # uncomment for python 2 (comment above!)

            cw = csv.writer(f2,delimiter=";")
            for l in itertools.groupby((l.split() for l in f),lambda x : x[0]):
                grouped = [x[1] for x in l[1]]
                if len(grouped)>1:
                    for c in itertools.combinations(grouped,2):
                        cw.writerow(c)

result (corrected, yours is not correct):

abc;def
jkl;mno
jkl;pqr
mno;pqr
Comments