A Rob4 A Rob4 - 3 months ago 17
Python Question

How to remove small components from a graph

I'm new to networkx and could do with some help please.

I have a set of data which I've processed to generate the nodes and edges. There are around 5000 groups of nodes that have more than 2 links within them (up to 10 nodes in the group in total). But the problem is that there are also several thousand pairs of nodes that only have 1 edge between them, i.e node a is linked to node b but neither are linked to any other node.

I want to remove these paired nodes from the chart.

Is there a way to filter these out?

Answer

So our goal is to remove all nodes from components with less than 3 nodes (this includes isolated nodes if they exist).

for component in list(nx.connected_components(G)):
    if len(component)<3:
        for node in component:
            G.remove_node(node)

A small warning is in order when using nx.connected_components. It returns a generator of components. If I didn't put list around it, it would generate one at a time, and then perform the steps for the given component. Once all that is done, it would generate the next component. But because G has been modified, python can't be sure that this behaves well. So it would die (complaining that a dictionary changed size --- the number of nodes in G changed). By turning it into a list, the components are all found before it starts the loop. So the graph won't be changing while the components are being found.