kemotoe kemotoe - 11 days ago 7
C++ Question

Parsing an input file to create a directed graph C++

So I'm starting a project on directed graphs and topological sorting. I'm looking for the most efficient way to parse an input file of courses in the form:

COURSE_1 COURSE_2


where COURSE_1 is a prerequisite to COURSE_2

NONE COURSE_3


where COURSE_3 has no prerequisites.

All vertex labels will be strings, obviously. I have created a
Graph
class with data members to store vertices, edges, and vertex labels. Later, I will be adding methods to topologically sort and find the shortest path from one vertex to another. Given these future plans, my questions here are would it be better to use an adjacency list? or matrix? Also what would be the most efficient way to populate the graph from the input files? My initial thought was using an adjacency list. Since the size of the graph isn't known at compile time my idea

std::vector<std::list<std::string>> adjacencyList;


Also, I thought of creating a helper function to parse the input file. Something along the lines of

void populateGraph(std::string filename, Graph* graph)


Am I totally going in the wrong direction here?

Answer

You're on the right track with a lot of things.

If you can assume that all the inputs you will encounter are just NONE and COURSE_X and you can assume that all Xes form a continuous interval of ints starting from 1 (or 0) and span to the number of vertices, then you could just treat the vertices as numbers internally. If that is not the case, you can assign each vertex label a number (using std::unordered_map for example) and have this abstract structure anyway.

Now, if you choose to stick with this model, it is convenient to use, because your whole graph can be represented as std::vector<std::list<int>>. You could substitute int with some struct type if you want to store more info about the edge, like labels, weights, etc. Whenever you want to access a specific node's adjacency list, you just access the vector's cell under the node's id.

Obviously this is an adjacency list based solution. It's good for sparse graphs generally speaking. Think of it this way: if you're using a matrix for a sparse graph, a very big part of that allocated memory is not going to be used. Using adjacency graph eliminates this, but it takes away the constant access time to any arbitrary edge. This means that it can take linear time to check if a given edge exists. That being said, I wouldn't expect you to use that check in a topological sort. If you chose to use a matrix however, you would still need to map the vertices to numbers, to that part stays.

Last but not least, you can use pointers instead of integer ids. Basically you wouldn't look up vertices in the vector using an id, but you'd directly access a vertice through a stored pointer. You'd most likely still need to have the string -> pointer mapping, at least for the graph creation.