Marre Marre -4 years ago 113
R Question

Getting randomly latitude/longitude data in R

I simulated a dataset for an online Retail market. The customer can purchase their products in different stores in Germany (e.g. Munich, Berlin, Hamburg..) and in Online stores. To get the latitude/longitude data from the cities I use

geocode
from the
ggmap package
. But customers who purchase Online are able to purchase them all over the country. Now I want to generate random latitude/longitude data within Germany for the online purchases, to map them later with shiny leaflet. Is there any way to do this?

My df looks like this:

View(df)
ClientId Store ... lat lon
1 Berlin 52 13
2 Munich 48 11
3 Online x x
4 Online x x


But my aim is a data frame for example like this:

ClientId Store ... lat lon
1 Berlin 52 13
2 Munich 48 11
3 Online 50 12
4 Online 46 10


Is there any way to get these random latitude/longitude data and integrate it to my data frame?

Answer Source

Your problem is twofold. First of all, as a newbie to R, you are not yet used to the semantics required to do what you need. Fundamentally, what you are asking to to do is:

  • First, Identify which orders are sourced from Online
  • Second, generate a random lat and lon for these orders

First, to identify elements of your data frame which fit a criterion, you use the which function. Thus, to find the rows in your data frame which have the Store column equal to "Online", you do:

df[which(df$Store=="Online")]

To update the lat or lon for a particular row, we need to be able to access the column. To get values of a particular column, we use $. For example, to get the lat values for the online orders you use:

df$lat[which(df$Store=="Online")]

Great! The problem now diverges and increases in complexity. For the new values, do you want to generate simple values to accomplish your demo, or do you want to come up with new logic to generate spacial results in a given region? You indicate you would like to generate data points in Germany itself, however, to accomplish that is beyond the scope of this question. For now, we will consider the easy example of generating values in a bounded box and updating your data.frame accordingly.

To generate integer values in a given range, we can use the sample function. Assuming that you would want lat values in the range of 45 and 55 and lon values in the range of 9 to 14 we can do the following:

df$lat[which(df$Store=="Online")]<-sample(45:55,length(which(df$Store=="Online")))
df$lon[which(df$Store=="Online")]<-sample(9:14,length(which(df$Store=="Online")))

Reading this code, we have update the lat values in df that are "Online" orders with a vector of random numbers from 48:52 that is the proper length (the number of "Online" orders).

If you wanted more decimal precision, you can use similar logic with the runif function which samples from the uniform distribution and round to get the appropriate amount of precision. Good luck!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download