agenis agenis - 1 year ago 54
R Question

Find shortest path from X,Y coordinates (with start ≠ end)

I have a dataframe with X and Y coordinates of points like this:

structure(list(X = c(666L, 779L, 176L, 272L, 232L, 74L, 928L,
667L, 1126L, 919L), Y = c(807, 518, 724, 221, 182, 807, 604,
384, 142, 728)), .Names = c("X", "Y"), row.names = c(NA, 10L), class = "data.frame")

i just want to find out the shortest path connecting all these points, and also return its total distance. There are no other conditions: every point can be connected to any other, no specific point to start or end, no weights, etc... I found lots of topics about igraph package but i can't figure out how to feed my data into it. Found also this but not in R language.
maybe a "brute force" script would be fine since i've got no more than 20 points..

Answer Source

As the comments suggest, your question is related to the Traveling Salesman Problem. It is in fact an example of a Hamiltonian path (a path which visits each node once, but does not end where it started). As you might expect, in R there's already a package for that: TSP. See this and this.

The Traveling Salesman Problem is extremely difficult to solve exactly (in a reasonable amount of time), so most of the algorithms are approximate and iterative: they explore various paths from a random starting point using algorithms which are "likely" to produce short paths, but not necessarily the absolute shortest.

In your example, with only 10 nodes, it is possible to execute an exhaustive (brute force) search. This is useful because we can compare the approximate TSP solution with the exact solution. Note however that even with only 10 nodes, the brute force approach requires examination of ~3.6 million paths, and takes about 2 minutes. With 20 nodes the number of paths is ~ 2.5 × 1018.

# Hamiltonian Path via traveling salesman problem
tsp <- TSP(dist(df))
tsp <- insert_dummy(tsp, label = "cut")
tour <- solve_TSP(tsp, method="2-opt", control=list(rep=10))
path.tsp <- unname(cut_tour(tour, "cut"))
#  [1]  6  3  5  4  8  2  1 10  7  9

# Hamiltonian Path via brute force
M  <- as.matrix(dist(df))     # distance matrix
paths   <- permn(nrow(M))     # all possible paths (~ 3.6e6 !!!)
lengths <- sapply(paths,function(p)sum(M[cbind(p[-nrow(M)],p[-1])])) <- paths[[which.min(lengths)]]
#  [1]  9  7 10  1  2  8  4  5  3  6

# plot the results

To convert a TSP problem into a Hamiltonian Path problem we have to add a "dummy vertex" which is a distance 0 from every other node (see the vignette referenced above). Then, the Hamiltonian Path is the TSP path starting from the node right after the dummy and ending at the node right before the dummy.

Notice that the brute force and the TSP solution identify the exact same path, but in the reverse order. This is because, of course, both paths have the same length.