kemotoe - 11 days ago 7

C++ Question

So I'm starting a project on directed graphs and topological sorting. I'm looking for the most efficient way to parse an input file of courses in the form:

`COURSE_1 COURSE_2`

where COURSE_1 is a prerequisite to COURSE_2

`NONE COURSE_3`

where COURSE_3 has no prerequisites.

All vertex labels will be strings, obviously. I have created a

`Graph`

`std::vector<std::list<std::string>> adjacencyList;`

Also, I thought of creating a helper function to parse the input file. Something along the lines of

`void populateGraph(std::string filename, Graph* graph)`

Am I totally going in the wrong direction here?

Answer

You're on the right track with a lot of things.

If you can assume that all the inputs you will encounter are just `NONE`

and `COURSE_X`

and you can assume that all Xes form a continuous interval of ints starting from 1 (or 0) and span to the number of vertices, then you could just treat the vertices as numbers internally. If that is not the case, you can assign each vertex label a number (using std::unordered_map for example) and have this abstract structure anyway.

Now, if you choose to stick with this model, it is convenient to use, because your whole graph can be represented as `std::vector<std::list<int>>`

. You could substitute `int`

with some struct type if you want to store more info about the edge, like labels, weights, etc. Whenever you want to access a specific node's adjacency list, you just access the vector's cell under the node's id.

Obviously this is an adjacency list based solution. It's good for sparse graphs generally speaking. Think of it this way: if you're using a matrix for a sparse graph, a very big part of that allocated memory is not going to be used. Using adjacency graph eliminates this, **but** it takes away the constant access time to any arbitrary edge. This means that it can take linear time to check if a given edge exists. That being said, I wouldn't expect you to use that check in a topological sort. If you chose to use a matrix however, you would still need to map the vertices to numbers, to that part stays.

Last but not least, you can use pointers instead of integer ids. Basically you wouldn't look up vertices in the vector using an id, but you'd directly access a vertice through a stored pointer. You'd most likely still need to have the string -> pointer mapping, at least for the graph creation.