sanaz - 1 year ago 59
Python Question

# How to count number of children for every node in an adjacency tree dataframe in R or Python recursively

I have a following dataframe :

``````network_id agent_id parent_id
1          10       6
1          11       7
1          12       7
1          13       8
1          6        5
1          7        5
1          8        5
2         104       101
2         105       101
2         106       101
2         107       102
2         108       103
2         101       100
2         102       100
2         103       100
``````

I need to calculate number of children for every agent in every network, parent_id shows the directly connected parent for each node.I am looking for a solution in R or Python

You can achieve this using Python, in particolar groupby functionality of pandas http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html:

``````import pandas as pd

cols = ['network_id', 'agent_id', 'parent_id']
df = pd.DataFrame([[1, 10, 6],
[1, 11, 7],
[1, 12, 7],
[1, 13, 8],
[1, 6,  5],
[1, 7,  5],
[1, 8,  5],
[2, 104,101],
[2, 105,101],
[2, 106,101],
[2, 107,102],
[2, 108,103],
[2, 101,100],
[2, 102,100],
[2, 103,100]], columns = cols)

grouped = df.groupby(('network_id', 'parent_id'))

all_children = {}

for (network, parent_id), parent_group in grouped:

children = parent_group['agent_id']
all_children[(network, parent_id)] = children

for child in children:
try:
children_of_child = grouped.get_group((network, child))
all_children[(network, parent_id)] = pd.concat([all_children[(network, parent_id)], children_of_child['agent_id']], axis = 0).reset_index(drop = True)
except KeyError:
pass

# Control for any duplicate
children_count = {key : len(el.drop_duplicates()) for key, el in all_children.iteritems()}
``````

Result:

``````{(1, 5): 7,
(1, 6): 1,
(1, 7): 2,
(1, 8): 1,
(2, 100): 8,
(2, 101): 3,
(2, 102): 1,
(2, 103): 1}
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download