ldevyataykina ldevyataykina - 4 months ago 23
Python Question

Graphviz: write result to file

I have dataframe

ID domain search_term
111 vk.com вконтакте
111 twitter.com фэйсбук
111 facebook.com твиттер
222 avito.ru купить машину
222 vk.com вконтакте
333 twitter.com твиттер
333 apple.com купить айфон
333 rbk.ru новости


I try to create chain with nodes and write it to file. I use

domains = df['domain'].values.tolist()
search_terms = df['search_term'].values.tolist()
ids = df['ID'].values.tolist()
f = Digraph('finite_state_machine', filename='fsm.gv', encoding='utf-8')
f.body.extend(['rankdir=LR', 'size="5,5"'])
f.attr('node', shape='circle')
for i, (id, domain, search_term) in enumerate(zip(ids, domains, search_terms)):
if ids[i] == ids[i - 1]:
f.edge(domains[i - 1], domains[i], label=search_terms[i])
f.view()


It returns this file
But I want to save it to file, like number of
ID
. I need to get file
111, 222, 333
.
I try

for i, (id, domain, search_term) in enumerate(zip(ids, domains, search_terms)):
if ids[i] == ids[i - 1]:
f = Digraph('finite_state_machine', filename='fsm.gv', encoding='utf-8')
f.body.extend(['rankdir=LR', 'size="5,5"'])
f.attr('node', shape='circle')
f.edge(domains[i - 1], domains[i], label=search_terms[i])
f.render(filename=str(id))


But It works wrong. It should return to
111
and
333
chain with 3 nodes, but in file I get chains with 2 nodes to
111
and
333
. This file to
111
:
result
What I do wrong and how can I fix that?

Answer

Do not put f = Digraph(...) and f.render(...) inside the if-statement. The code inside the if-statement should get executed once for every edge. You do not want to create a new Digraph and render it for every edge.

So instead, you could use df.groupby to have Pandas identify the rows with the same ID. Then call f = Digraph(...) and f.render(...) once for every group:

for id_key, group in df.groupby('ID'):
    f = Digraph('finite_state_machine', filename='fsm.gv', encoding='utf-8')
    f.body.extend(['rankdir=LR', 'size="5,5"'])
    f.attr('node', shape='circle')
    for i in range(len(group)-1):
        f.edge(group['domain'].iloc[i], group['domain'].iloc[i+1], 
               label=group['search_term'].iloc[i+1])
    f.render(filename=str(id_key))