Igor - 1 year ago 44

R Question

Trying to think of a good function to achieve the following (seems obvious but insufficient experience to think of it)

There are 4 problems I wish to solve:

- calculate the distances between all points (cities) in a dataset (so in 5 rows, (n-1)+(n-2)+...+(n-n)) 10 total distances)
- calculate the product of the populations of those cities
- calculate the gravity
- determine direction of movement (simple check of larger population of the two compared cities)

Based on a dataset like this one (the values are a bit un-inspired, but they should represent lon-lat's):

`location population`

1 10,100 1000

2 20,200 2000

3 30,300 3000

4 40,400 4000

5 50,500 5000

Get to a dataset that contains:

- distance: location A-B
- pop.prod. = product of the two populations (A

and B) - gravity = pop.prod. / distance
- directedness = if A>B; edge from B to A, else; edge from A to B

`distance pop.prod. gravity directedness`

1-2 x x x x

1-3 x x x x

1-4 x x x x

1-5 x x x x

2-3 x x x x

2-4 x x x x

2-5 x x x x

3-4 x x x x

3-5 x x x x

4-5 x x x x

Small disclaimer: this is NOT an assignment :)

I just wanted to see movement/commuter estimations in the area I live in, and hopefully it benefits everyone!

The idea behind it is called "Gravity Based Model" in order to estimate commutes.

Any help is welcome, also on subsets of the problem.

Many thanks in advance.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Here's a starter:

```
df <- read.table(header=T, text=" location population
1 10,10 1000
2 20,20 2000
3 30,30 3000
4 40,40 4000
5 50,50 5000", stringsAsFactors=F)
locs <- do.call(rbind, lapply(strsplit(df$location,",",T), as.integer))
(idx <- combn(1:nrow(locs), 2))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 1 1 1 2 2 2 3 3 4
# [2,] 2 3 4 5 3 4 5 4 5 5
(distance <- geosphere::distHaversine(locs[idx[1,],], locs[idx[2,],]) )
# [1] 1546488 3044009 4463588 5770107 1500779 2930665 4260187 1436941 2785801 1360777
(popProd <- df$population[idx[1,]]*df$population[idx[2,]])
# [1] 2000000 3000000 4000000 5000000 6000000 8000000 10000000 12000000 15000000 20000000
```

I think this gives you an idea and you can figure out the rest.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**