pyrookie pyrookie - 4 months ago 31
Python Question

Counting all connected nodes in graph

I have a >10k list of (unordered) pairs of numbers. I'd like to classify them into sets of connected pairs either directly or indirectly. I think this corresponds to undirected graph. I'm using python, and tried something like this to represent this structure.

In order to know all the numbers connected to

I can examine whether there is a path from
for all
in the list except
. However, with this implementation, the computation time gets too long for the size of list I'm dealing with. Is there a more efficient way to do this? (Or is there an already established python libraries?)

jme jme

It sounds as though you are interested in computing the connected components of a graph. I would suggest looking into the networkx package and its tools for computing components.

For example, suppose our data is a list of pairs of numbers, each pair representing an edge in the graph:

pairs = [
    (1, 2),
    (2, 4),
    (3, 5),
    (2, 5),
    (7, 9),
    (9, 10),
    (8, 7)

In the graph represented by these edges, there is a path between any pair of nodes in the set {1, 2, 3, 4, 5}, and there is also a path between any pair of nodes in {6, 7, 8, 9, 10}. But there is no path, say, from 5 to 7. This is to say that there are two connected components in the graph.

To discover these components, we first import networkx and create a graph:

>>> import networkx as nx
>>> graph = nx.from_edgelist(pairs)

Computing the components is as simple as

>>> list(nx.connected_components(graph))
>>> [{1, 2, 3, 4, 5}, {6, 7, 8, 9, 10}]

nx.connected_components is a generator, and so here we converted the result into a list in order to show all of the connected components.

We can also find the connected component containing a given node:

>>> nx.node_connected_component(graph, 3)
{1, 2, 3, 4, 5}

We can also quickly count the number of connected components:

>>> nx.number_connected_components(graph)