pyrookie pyrookie - 9 months ago 53
Python Question

Counting all connected nodes in graph

I have a >10k list of (unordered) pairs of numbers. I'd like to classify them into sets of connected pairs either directly or indirectly. I think this corresponds to undirected graph. I'm using python, and tried something like this to represent this structure.

In order to know all the numbers connected to

i
,
I can examine whether there is a path from
i
to
j
for all
j
in the list except
i
. However, with this implementation, the computation time gets too long for the size of list I'm dealing with. Is there a more efficient way to do this? (Or is there an already established python libraries?)

jme jme
Answer Source

It sounds as though you are interested in computing the connected components of a graph. I would suggest looking into the networkx package and its tools for computing components.

For example, suppose our data is a list of pairs of numbers, each pair representing an edge in the graph:

pairs = [
    (1, 2),
    (2, 4),
    (3, 5),
    (2, 5),
    (7, 9),
    (9, 10),
    (8, 7)
]

In the graph represented by these edges, there is a path between any pair of nodes in the set {1, 2, 3, 4, 5}, and there is also a path between any pair of nodes in {6, 7, 8, 9, 10}. But there is no path, say, from 5 to 7. This is to say that there are two connected components in the graph.

To discover these components, we first import networkx and create a graph:

>>> import networkx as nx
>>> graph = nx.from_edgelist(pairs)

Computing the components is as simple as

>>> list(nx.connected_components(graph))
>>> [{1, 2, 3, 4, 5}, {6, 7, 8, 9, 10}]

nx.connected_components is a generator, and so here we converted the result into a list in order to show all of the connected components.

We can also find the connected component containing a given node:

>>> nx.node_connected_component(graph, 3)
{1, 2, 3, 4, 5}

We can also quickly count the number of connected components:

>>> nx.number_connected_components(graph)
2