John Smith John Smith - 1 month ago 6
R Question

Building Sentences from a dataframe in R

Im trying to generate sentences from a dataframe
Below is the dataframe

# Code
mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD")
mycode <- sample(mycode, 20, replace = TRUE)

# Date
mydate <-c("2016-10-17","2016-10-18","2016-10-19","2016-10-20")
mydate <-sample(mydate, 20, replace = TRUE)

# resort
myresort <-c("GB","IE","GR","DK")
myresort <-sample(myresort, 20, replace = TRUE)

# Number of holidaymakers
HolidayMakers <- sample(1000, 20, replace = TRUE)

mydf <- data.frame(mycode,
mydate,
myresort,
HolidayMakers)


So if we take
mycode
as an example, I want to create a sentence like "For the code
mycode
, the biggest destinations are
myresorts
where the top days of visiting were
mydate
with a total of
HolidayMakers
"

If we assume that there are multiple lines per code. What i want is a single sentence where for example instead of having one sentence per
mydate
and
myresort
, i would like to say something like

"For the code AAABBB, the biggest destinations are GB,GR,DK,IE where the top days of visiting were 2016-10-17,2016-10-18,2016-10-19 with a total of 650"

The 650 would basically be a sum of all the holiday makers for all those countries for those days per mycode

Any anyone help?

Thank you for your time

Answer

You could try:

library(dplyr)
res <- mydf %>%
  group_by(mycode) %>%
  summarise(d = toString(unique(mydate)), 
            r = toString(unique(myresort)), 
            h = sum(HolidayMakers)) %>%
  mutate(s = paste("For the code", mycode, 
                   "the biggest destinations are", r, 
                   "where the top days of visiting were", d, 
                   "with a total of", h))

Which gives:

> res$s

#[1] "For the code AAABBB the biggest destinations are GB, GR, IE, DK 
#     where the top days of visiting were 2016-10-17, 2016-10-18, 
#     2016-10-20, 2016-10-19 with a total of 6577"
#[2] "For the code AAABBD the biggest destinations are IE 
#     where the top days of visiting were 2016-10-17, 2016-10-18 
#     with a total of 1925"                                    
#[3] "For the code AAACCC the biggest destinations are IE, GR, DK 
#     where the top days of visiting were 2016-10-20, 2016-10-17, 
#     2016-10-19, 2016-10-18 with a total of 2878"    

Note: Since you didn't provide any guidance as to how you intend to calculate the "top visiting days", I simply included all days. You could easily edit the above to fit your actual situation.

Comments