sanaz sanaz - 10 days ago 6
Python Question

How to count number of children for every node in an adjacency tree dataframe in R or Python recursively

I have a following dataframe :

network_id agent_id parent_id
1 10 6
1 11 7
1 12 7
1 13 8
1 6 5
1 7 5
1 8 5
2 104 101
2 105 101
2 106 101
2 107 102
2 108 103
2 101 100
2 102 100
2 103 100


I need to calculate number of children for every agent in every network, parent_id shows the directly connected parent for each node.I am looking for a solution in R or Python

Answer

You can achieve this using Python, in particolar groupby functionality of pandas http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html:

import pandas as pd

cols = ['network_id', 'agent_id', 'parent_id']
df = pd.DataFrame([[1, 10, 6],
                    [1, 11, 7],
                    [1, 12, 7],
                    [1, 13, 8],
                    [1, 6,  5],
                    [1, 7,  5],
                    [1, 8,  5],
                    [2, 104,101],
                    [2, 105,101],
                    [2, 106,101],
                    [2, 107,102],
                    [2, 108,103],
                    [2, 101,100],
                    [2, 102,100],
                    [2, 103,100]], columns = cols)


grouped = df.groupby(('network_id', 'parent_id'))

all_children = {}

for (network, parent_id), parent_group in grouped:

    children = parent_group['agent_id']
    # Add direct children
    all_children[(network, parent_id)] = children

    for child in children:
        try:
            children_of_child = grouped.get_group((network, child))  
            # Add children of children
            all_children[(network, parent_id)] = pd.concat([all_children[(network, parent_id)], children_of_child['agent_id']], axis = 0).reset_index(drop = True)
        except KeyError:
            pass

# Control for any duplicate
children_count = {key : len(el.drop_duplicates()) for key, el in all_children.iteritems()}

Result:

{(1, 5): 7,
 (1, 6): 1,
 (1, 7): 2,
 (1, 8): 1,
 (2, 100): 8,
 (2, 101): 3,
 (2, 102): 1,
 (2, 103): 1}