kalfasyan - 6 months ago 35

Python Question

So, I have a **huge** input file that looks like this: (you can download here)

`1. FLO8;PRI2`

2. FLO8;EHD3

3. GRI2;BET2

4. HAL4;AAD3

5. PRI2;EHD3

6. QLN3;FZF1

7. QLN3;ABR5

8. FZF1;ABR5

...

See it like a two column table, that the element before ";" shows to the element after ";"

I want to print simple strings iteratively that show the three elements that constitute a feedforward loop.

The example numbered list from above would output:

`"FLO8 PRI2 EHD3"`

"QLN3 FZF1 ABR5"

...

Explaining the first output line as a feedforward loop:

`A -> B (FLO8;PRI2)`

B -> C (PRI2;EHD3)

A -> C (FLO8;EHD3)

Only the circled one from this link

So, I have this, but it is terribly slow...Any suggestions to make a faster implementation?

`import csv`

TF = []

TAR = []

# READING THE FILE

with open("MYFILE.tsv") as tsv:

for line in csv.reader(tsv, delimiter=";"):

TF.append(line[0])

TAR.append(line[1])

# I WANT A BETTER WAY TO RUN THIS.. All these for loops are killing me

for i in range(len(TAR)):

for j in range(len(TAR)):

if ( TAR[j] != TF[j] and TAR[i] != TF[i] and TAR[i] != TAR[j] and TF[j] == TF[i] ):

for k in range(len(TAR )):

if ( not(k == i or k == j) and TF[k] == TAR[j] and TAR[k] == TAR[i]):

print "FFL: "+TF[i]+ " "+TAR[j]+" "+TAR[i]

NOTE: I don't want self-loops...from A -> A, B -> B or C -> C

Answer

I use a dict of sets to allow very fast lookups, like so:

**Edit:** prevented self-loops:

```
from collections import defaultdict
INPUT = "RegulationTwoColumnTable_Documented_2013927.tsv"
# load the data as { "ABF1": set(["ABF1", "ACS1", "ADE5,7", ... ]) }
data = defaultdict(set)
with open(INPUT) as inf:
for line in inf:
a,b = line.rstrip().split(";")
if a != b: # no self-loops
data[a].add(b)
# find all triplets such that A -> B -> C and A -> C
found = []
for a,bs in data.items():
bint = bs.intersection
for b in bs:
for c in bint(data[b]):
found.append("{} {} {}".format(a, b, c))
```

On my machine, this loads the data in 0.36s and finds 1,933,493 solutions in 2.90s; results look like

```
['ABF1 ADR1 AAC1',
'ABF1 ADR1 ACC1',
'ABF1 ADR1 ACH1',
'ABF1 ADR1 ACO1',
'ABF1 ADR1 ACS1',
```

**Edit2:** not sure this is what you want, but if you need A -> B and A -> C and B -> C but not B -> A or C -> A or C -> B, you could try

```
found = []
for a,bs in data.items():
bint = bs.intersection
for b in bs:
if a not in data[b]:
for c in bint(data[b]):
if a not in data[c] and b not in data[c]:
found.append("{} {} {}".format(a, b, c))
```

but this still returns 1,380,846 solutions.