kalfasyan kalfasyan - 1 month ago 7
Python Question

Print Feedforward Loops in python

So, I have a huge input file that looks like this: (you can download here)

1. FLO8;PRI2
2. FLO8;EHD3
3. GRI2;BET2
4. HAL4;AAD3
5. PRI2;EHD3
6. QLN3;FZF1
7. QLN3;ABR5
8. FZF1;ABR5
...


See it like a two column table, that the element before ";" shows to the element after ";"

I want to print simple strings iteratively that show the three elements that constitute a feedforward loop.
The example numbered list from above would output:

"FLO8 PRI2 EHD3"
"QLN3 FZF1 ABR5"
...


Explaining the first output line as a feedforward loop:

A -> B (FLO8;PRI2)
B -> C (PRI2;EHD3)
A -> C (FLO8;EHD3)


Only the circled one from this link

So, I have this, but it is terribly slow...Any suggestions to make a faster implementation?

import csv

TF = []
TAR = []

# READING THE FILE
with open("MYFILE.tsv") as tsv:
for line in csv.reader(tsv, delimiter=";"):
TF.append(line[0])
TAR.append(line[1])

# I WANT A BETTER WAY TO RUN THIS.. All these for loops are killing me
for i in range(len(TAR)):
for j in range(len(TAR)):
if ( TAR[j] != TF[j] and TAR[i] != TF[i] and TAR[i] != TAR[j] and TF[j] == TF[i] ):
for k in range(len(TAR )):
if ( not(k == i or k == j) and TF[k] == TAR[j] and TAR[k] == TAR[i]):
print "FFL: "+TF[i]+ " "+TAR[j]+" "+TAR[i]


NOTE: I don't want self-loops...from A -> A, B -> B or C -> C

Answer

I use a dict of sets to allow very fast lookups, like so:

Edit: prevented self-loops:

from collections import defaultdict

INPUT = "RegulationTwoColumnTable_Documented_2013927.tsv"

# load the data as { "ABF1": set(["ABF1", "ACS1", "ADE5,7", ... ]) }
data = defaultdict(set)
with open(INPUT) as inf:
    for line in inf:
        a,b = line.rstrip().split(";")
        if a != b:          # no self-loops
            data[a].add(b)

# find all triplets such that A -> B -> C and  A -> C
found = []
for a,bs in data.items():
    bint = bs.intersection
    for b in bs:
        for c in bint(data[b]):
            found.append("{} {} {}".format(a, b, c))

On my machine, this loads the data in 0.36s and finds 1,933,493 solutions in 2.90s; results look like

['ABF1 ADR1 AAC1',
 'ABF1 ADR1 ACC1',
 'ABF1 ADR1 ACH1',
 'ABF1 ADR1 ACO1',
 'ABF1 ADR1 ACS1',

Edit2: not sure this is what you want, but if you need A -> B and A -> C and B -> C but not B -> A or C -> A or C -> B, you could try

found = []
for a,bs in data.items():
    bint = bs.intersection
    for b in bs:
        if a not in data[b]:
            for c in bint(data[b]):
                if a not in data[c] and b not in data[c]:
                    found.append("{} {} {}".format(a, b, c))

but this still returns 1,380,846 solutions.