buzhidao - 1 month ago 8
C++ Question

# Store the all the neighbors of the vertexes, find a vertex's neighbors quickly

Suppose I have

`N`
vertexes labeled from
`0`
to
`N-1`
. Each vertex has at most
`3`
neighbors and at least
`1`
neighbor. Suppose the neighbor information is initially stored at a vector called
`pairs`
, where
`pairs[2*i]`
and
`pairs[2*i+1]`
are a pair of neighboring vertexes.

Now I need to look up very fast what the neighbors are for
`vertex[i]`
, what's the best way to store this information?

The method I came up with is:

• declare a vector called
`neighbors[3*N]`
, so that
`neighbors[3*i+0]`
`neighbors[3*i+1]`
and
`neighbors[3*i+2]`
stores three possible neighbors.

• why saying possible, because there are at most three neighbors for each vertex.

• so I initialize all the elements of vector
`neighbors`
to
`N`
, which means this is not a valid neighbor, since the vertexes are labeled from
`0`
to
`N-1`
.

The code implementation is as follows:

``````void get_neighbors(const std::vector<int>& pairs,
std::vector<int>& neighbors)   {
int N = neighbors.size()/3;
int M = pairs.size()/2;
//init all the vertexes' neighbours to nothing
for (int i=0; i<3*N; ++i)   {
neighbors[i] = N;
}

//loop through all the vertexes, and store their neighbors
for(int i=0; i<N; ++i)  {
int j = 0;
//loop through pairs, and find out what neighbors vertex[i] has
for (int k=0; k < M; ++k) {
if(pairs[2*k]==i)    {
neighbors[3*i+j]=pairs[2*k+1];
++j;
}
else if(pairs[2*k+1]==i)    {
neighbors[3*i+j]=pairs[2*k];
++j;
}
}
}
}
``````

Things I feel uncomfortable with my method:

1. declare a vector
`neighbors(3*N)`
is too much, since many of its elements will be useless
`N`
.

2. If I want to look up
`vertex[i]`
's neighbors, every time I need to test if
`neighbors[3*i]==N`
,
`neighbors[3*i+1]==N`
and
`neighbors[3*i]==N`
.

The data structure that you are working with is a Graph. But Graph is a more general concept in which a node can have any number of neighbours.

Since your problem is a more strict one saying each node has at-most 3 neighbours. The solution I suggested in the comments is a general one that can be used for all the graphs and will work decently well for your case also.

Declare a graph :

``````vector<vector<int> > graph (N);
``````

If there is an edge from `a` to `b` the do this :

``````graph[a].push_back (b);
``````

To retrieve the neighbours of a node `a` :

``````for (int i=0;i<graph[a].size();i++)
// Do whatever you want with the neighbour. The variable graph[a][i] holds the neighbour number
``````

Now coming to your problem of at-most 3 neighbours. The best solution according to me is :

``````struct node
{
int data; // Some data related to the node. This is optional.
int neigh_1;
int neigh_2;
int neigh_3;
}
``````

And make graph as an array of nodes :

``````node[N] graph;
``````

This will also take 3*N memory. But this is a better approach that what you have done. Let me explain :

When you say give me the neighbours of node `x` :

Your method will access memory locations x, 3*x, 3*x + 4, 3*x + 8.

My method will access the memory locations x, x+4, x+8, x+16

Wait ! Why are we accessing `x` when we want only its neighbours ? It is a valid question, but if you need to process some value at node or store some information local to a node then we have to access `x`.

Note : The approach I am suggesting will not perform worse than your approach. But it surely will perform better for a general case.

Why my approach is better ?

My method will will result in lesser cache misses, because I am having a better spatial locality of reference.

If you don't care about the working of cache, in simple words my method will be accessing the locations that are closer to each other and hence have a very very high chance of already being in the faster memory cache which will lead to faster access.

Also someone might argue that we should declare memory for a neighbour only when we require it. Essentially the `vector` solution does that. But it will not perform well in this case because even declaring a pointer takes 8 bytes. But with that space you can store information of 2 integers, i.e. information about 2 neighbours.

So dynamically allocating memory will not save any space as the maximum neighbours are 3.