buzhidao buzhidao - 1 month ago 8
C++ Question

Store the all the neighbors of the vertexes, find a vertex's neighbors quickly

Suppose I have

N
vertexes labeled from
0
to
N-1
. Each vertex has at most
3
neighbors and at least
1
neighbor. Suppose the neighbor information is initially stored at a vector called
pairs
, where
pairs[2*i]
and
pairs[2*i+1]
are a pair of neighboring vertexes.

Now I need to look up very fast what the neighbors are for
vertex[i]
, what's the best way to store this information?




The method I came up with is:


  • declare a vector called
    neighbors[3*N]
    , so that
    neighbors[3*i+0]
    neighbors[3*i+1]
    and
    neighbors[3*i+2]
    stores three possible neighbors.

  • why saying possible, because there are at most three neighbors for each vertex.

  • so I initialize all the elements of vector
    neighbors
    to
    N
    , which means this is not a valid neighbor, since the vertexes are labeled from
    0
    to
    N-1
    .



The code implementation is as follows:

void get_neighbors(const std::vector<int>& pairs,
std::vector<int>& neighbors) {
int N = neighbors.size()/3;
int M = pairs.size()/2;
//init all the vertexes' neighbours to nothing
for (int i=0; i<3*N; ++i) {
neighbors[i] = N;
}

//loop through all the vertexes, and store their neighbors
for(int i=0; i<N; ++i) {
int j = 0;
//loop through pairs, and find out what neighbors vertex[i] has
for (int k=0; k < M; ++k) {
if(pairs[2*k]==i) {
neighbors[3*i+j]=pairs[2*k+1];
++j;
}
else if(pairs[2*k+1]==i) {
neighbors[3*i+j]=pairs[2*k];
++j;
}
}
}
}


Things I feel uncomfortable with my method:


  1. declare a vector
    neighbors(3*N)
    is too much, since many of its elements will be useless
    N
    .

  2. If I want to look up
    vertex[i]
    's neighbors, every time I need to test if
    neighbors[3*i]==N
    ,
    neighbors[3*i+1]==N
    and
    neighbors[3*i]==N
    .


PRP PRP
Answer

The data structure that you are working with is a Graph. But Graph is a more general concept in which a node can have any number of neighbours.

Since your problem is a more strict one saying each node has at-most 3 neighbours. The solution I suggested in the comments is a general one that can be used for all the graphs and will work decently well for your case also.

Declare a graph :

vector<vector<int> > graph (N);

If there is an edge from a to b the do this :

graph[a].push_back (b);

To retrieve the neighbours of a node a :

for (int i=0;i<graph[a].size();i++)
    // Do whatever you want with the neighbour. The variable graph[a][i] holds the neighbour number

Now coming to your problem of at-most 3 neighbours. The best solution according to me is :

struct node
{
    int data; // Some data related to the node. This is optional.
    int neigh_1;
    int neigh_2;
    int neigh_3;
}

And make graph as an array of nodes :

node[N] graph;

This will also take 3*N memory. But this is a better approach that what you have done. Let me explain :

When you say give me the neighbours of node x :

Your method will access memory locations x, 3*x, 3*x + 4, 3*x + 8.

My method will access the memory locations x, x+4, x+8, x+16

Wait ! Why are we accessing x when we want only its neighbours ? It is a valid question, but if you need to process some value at node or store some information local to a node then we have to access x.

Note : The approach I am suggesting will not perform worse than your approach. But it surely will perform better for a general case.

Why my approach is better ?

My method will will result in lesser cache misses, because I am having a better spatial locality of reference.

If you don't care about the working of cache, in simple words my method will be accessing the locations that are closer to each other and hence have a very very high chance of already being in the faster memory cache which will lead to faster access.

Also someone might argue that we should declare memory for a neighbour only when we require it. Essentially the vector solution does that. But it will not perform well in this case because even declaring a pointer takes 8 bytes. But with that space you can store information of 2 integers, i.e. information about 2 neighbours.

So dynamically allocating memory will not save any space as the maximum neighbours are 3.

Comments