River River - 13 days ago 6
C Question

Optimizing a method to find the most traversed edge given an adjacency graph and several traversals

I am given N vertices of a tree and its corresponding adjacency graph represented as an N by N array,

adjGraph[N][N]
. For example, if (1,3) is an edge, then
adjGraph[0][2] == 1
. Otherwise,
adjGraph[i][j] == 0
for (i,j)s that are not edges.

I'm given a series of inputs in the form of:

1 5


which denote that a path has been traversed starting from vertex 1 to vertex 5. I wish to find the edge that was travesed the most times, along with the number of times it was traversed. To do this, I have another N by N array,
numPass[N][N]
, whose elements I first initialize to 0, then increment by 1 every time I identify a path that includes an edge that matches its index. For example, if path (2,4) included edges (2,3) and (3,4), I would increment
numPass[1][2]
and
numPass[2][3]
by 1 each.

As I understand it, the main issue to tackle is that the inputs only give information of the starting vertex and ending vertex, and it is up to me to figure out which edges connect the two. Since the given graph is a tree, any path between two vertices is unique. Therefore, I assumed that given the index of the ending vertex for any input path, I would be able to recursively backtrack which edges were connected.

The following is the function code that I have tried to implement with that idea in mind:

// find the (unique) path of edges from vertices x to y
// and increment edges crossed during such a path
void findPath(int x, int y, int N, int adjGraph[][N], int numPass[][N]) {
int temp;

// if the path is a single edge, case is trivial
if (adjGraph[x][y] == 1) {
numPass[x][y] += 1;
return;
}

// otherwise, find path by backtracking from y
backtrack: while (1) {
temp = y-1;
if (adjGraph[temp][y] == 1) {
numPass[temp][y] += 1;
break;
}
}
if (adjGraph[x][temp] == 1) {
numPass[x][temp] += 1;
return;
} else {
y = temp;
goto backtrack;
}


However, the problem is that while my code works fine for small inputs, it runs out of memory for large inputs, since I have a required memory limit of 128MB and time limit of 1 second. The ranges for the inputs are up to 222222 vertices, and 222222 input paths.

How could I optimize my method to satisfy such large inputs?

Answer
  1. Get rid of the adjacency matrix (it uses O(N^2) space). Use adjacency lists instead.

  2. Use a more efficient algorithm. Let's make the tree rooted. For a path from a to b we can add 1 to a and b and subtract 1 from their lca (it is easy to see that this way a one is added to edges on this path and only to them).

  3. After processing all paths, the number of paths going through the edge is just a sum in the subtree.

If we use an efficient algorithm to compute lca, this solution works in O(N + Q * log N), where Q is the number of paths. It looks good enough for this constraints (we can actually do even better by using more complex and more efficient algorithms for finding the lca, but I don't think it's necessary here).

Note: lca means lowest common ancestor.