jrzelling jrzelling - 3 months ago 8
R Question

Generate data frame length (and column data) from function

I want to generate a data frame with a random length.

> head(df)
"id" "age"
53 12 # randomly chosen data from fn1(){} and fn2(){}
146 31 #
343 22 #
...#randomly generated length from sample(50:5000,1)


The problem is that the way I'm trying is just repeat the same element over and over:

# This just repeats the same value instead of generating function over and over
a <- fn1(){}
rep(a,15)
[1] "S" "S" "S" "S" "S" "S" "S" ...


Ideally the column names I want to specify and assign a value from other functions:

# Generate length of data frame
df.length <- sample(50:500,1)

# Generate data for each row from function
df.column.id <- fn1(){}
df.column.age <- fn2(){}
...

df <- data.frame("id" = df.column.id, "age" = df.column.age, ...)


Unfortunately the rep function isn't working, so how can the data frame columns be generated from functions? I also tried
matrix(data = c(df.column.id, df.column.age), nrow = df.length)
didn't work as intended.

Answer

Maybe something like this could help:

min_rownum <- 10
max_rownum <- 50
num_of_rows <- sample(seq(min_rownum, max_rownum), 1)
min_age <- 1 
max_age <- 50
age <- sample(seq(min_age, max_age), num_of_rows, replace = TRUE)
min_ID <- 50
max_ID <- 500
id <- sample(seq(min_ID, max_ID), num_of_rows)
df1 <- data.frame(id, age)

I tried to use variable names that would make the code self-explanatory.

The parameter replace = TRUE in the sample() function means that an element can be selected more than once. In the case of ages this is plausible, whereas IDs should be unique. The second argument of sample() defines how many elements should be chosen from the vector that is passed as a first argument.


The title of the question suggests that the data.frame should be generated by a function. In that case the above code can be wrapped into a function like this:

make_random_df <- function(min_rownum=10, max_rownum=50, min_age=1, max_age=50,
                       min_ID=50, max_ID=500) {
  num_of_rows <- sample(seq(min_rownum, max_rownum), 1)
  age <- sample(seq(min_age, max_age), num_of_rows, replace = TRUE)
  id <- sample(seq(min_ID, max_ID), num_of_rows)
  df1 <- data.frame(id, age)
}

Using this function, the data.frame can be created with

my_random_df <- make_random_df()
#> head(my_random_df)
#   id age
#1 461   7
#2  86  44
#3 319   8
#4 363  45
#5  59   3
#6 258  49