Igor - 1 year ago 55
R Question

# Generate gravity-based commuter estimation dataset in R

Trying to think of a good function to achieve the following (seems obvious but insufficient experience to think of it)

There are 4 problems I wish to solve:

1. calculate the distances between all points (cities) in a dataset (so in 5 rows, (n-1)+(n-2)+...+(n-n)) 10 total distances)

2. calculate the product of the populations of those cities

3. calculate the gravity

4. determine direction of movement (simple check of larger population of the two compared cities)

Based on a dataset like this one (the values are a bit un-inspired, but they should represent lon-lat's):

``````     location   population
1    10,100     1000
2    20,200     2000
3    30,300     3000
4    40,400     4000
5    50,500     5000
``````

Get to a dataset that contains:

1. distance: location A-B

2. pop.prod. = product of the two populations (A
and B)

3. gravity = pop.prod. / distance

4. directedness = if A>B; edge from B to A, else; edge from A to B

``````     distance   pop.prod.   gravity    directedness
1-2    x          x           x          x
1-3    x          x           x          x
1-4    x          x           x          x
1-5    x          x           x          x
2-3    x          x           x          x
2-4    x          x           x          x
2-5    x          x           x          x
3-4    x          x           x          x
3-5    x          x           x          x
4-5    x          x           x          x
``````

Small disclaimer: this is NOT an assignment :)
I just wanted to see movement/commuter estimations in the area I live in, and hopefully it benefits everyone!
The idea behind it is called "Gravity Based Model" in order to estimate commutes.

Any help is welcome, also on subsets of the problem.

Here's a starter:

``````df <- read.table(header=T, text="     location   population
1    10,10     1000
2    20,20     2000
3    30,30     3000
4    40,40     4000
5    50,50     5000", stringsAsFactors=F)

locs <- do.call(rbind, lapply(strsplit(df\$location,",",T), as.integer))
(idx <- combn(1:nrow(locs), 2))
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    2    2    2    3    3     4
# [2,]    2    3    4    5    3    4    5    4    5     5

(distance <- geosphere::distHaversine(locs[idx[1,],], locs[idx[2,],]) )
# [1] 1546488 3044009 4463588 5770107 1500779 2930665 4260187 1436941 2785801 1360777

(popProd <- df\$population[idx[1,]]*df\$population[idx[2,]])
# [1]  2000000  3000000  4000000  5000000  6000000  8000000 10000000 12000000 15000000 20000000
``````

I think this gives you an idea and you can figure out the rest.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download