Igor Igor - 9 months ago 27
R Question

Generate gravity-based commuter estimation dataset in R

Trying to think of a good function to achieve the following (seems obvious but insufficient experience to think of it)

There are 4 problems I wish to solve:


  1. calculate the distances between all points (cities) in a dataset (so in 5 rows, (n-1)+(n-2)+...+(n-n)) 10 total distances)

  2. calculate the product of the populations of those cities

  3. calculate the gravity

  4. determine direction of movement (simple check of larger population of the two compared cities)



Based on a dataset like this one (the values are a bit un-inspired, but they should represent lon-lat's):

location population
1 10,100 1000
2 20,200 2000
3 30,300 3000
4 40,400 4000
5 50,500 5000


Get to a dataset that contains:


  1. distance: location A-B

  2. pop.prod. = product of the two populations (A
    and B)

  3. gravity = pop.prod. / distance

  4. directedness = if A>B; edge from B to A, else; edge from A to B

    distance pop.prod. gravity directedness
    1-2 x x x x
    1-3 x x x x
    1-4 x x x x
    1-5 x x x x
    2-3 x x x x
    2-4 x x x x
    2-5 x x x x
    3-4 x x x x
    3-5 x x x x
    4-5 x x x x



Small disclaimer: this is NOT an assignment :)
I just wanted to see movement/commuter estimations in the area I live in, and hopefully it benefits everyone!
The idea behind it is called "Gravity Based Model" in order to estimate commutes.

Any help is welcome, also on subsets of the problem.
Many thanks in advance.

Answer Source

Here's a starter:

df <- read.table(header=T, text="     location   population
1    10,10     1000
2    20,20     2000 
3    30,30     3000
4    40,40     4000
5    50,50     5000", stringsAsFactors=F)

locs <- do.call(rbind, lapply(strsplit(df$location,",",T), as.integer))
(idx <- combn(1:nrow(locs), 2))
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    2    2    2    3    3     4
# [2,]    2    3    4    5    3    4    5    4    5     5

(distance <- geosphere::distHaversine(locs[idx[1,],], locs[idx[2,],]) )
# [1] 1546488 3044009 4463588 5770107 1500779 2930665 4260187 1436941 2785801 1360777

(popProd <- df$population[idx[1,]]*df$population[idx[2,]])
 # [1]  2000000  3000000  4000000  5000000  6000000  8000000 10000000 12000000 15000000 20000000

I think this gives you an idea and you can figure out the rest.