aashanand aashanand - 3 months ago 13
R Question

Load dataset from "R" package using data(), assign it directly to a variable?

How do you load a dataset from an R package using the

data()
function, and assign it directly to a variable without creating a duplicate copy in your environment?

Put simply, can you do this without creating two identical dfs in your environment:

> data("faithful") # Old Faithful Geyser Data from datasets package

> x <- faithful

> ls() # Now I have 2 identical dfs - x and faithful - in my environment
[1] "faithful" "x"

> remove(faithful) # Now I've removed one of the redundant dfs


Try 1:

My first approach was to just assign
data("faithful")
to
x
. But
data()
returns a string. So now I have the df
faithful
and the character vector
x
in my environment.

> x <- data("faithful")
> x
[1] "faithful" # String, not the df "faithful" from the datasets package

> ls()
[1] "faithful" "x"


Try 2:
Tried to get a little more sophisticated in my second attempt.

> x <- get(data("faithful")) # This works as far as assignment goes

> ls() # However I still get the duplicate copy
[1] "faithful" "x"


A short note about my motivation for trying to do this. I have an R package with 5 very large data.frames - each having the same columns. I want to efficiently generate the same calculated columns on all 5 data.frames. So I want to use
data()
within a
list()
constructor to get the 5 data.frames into a list. Then I want to use
llply()
and
mutate()
from the
plyr
package to iterate over the dfs in the list and create the calculated columns for each df. But I don't want to have duplicate copies of the 5 large datasets sitting in my environment as this is within a Shiny App with a RAM limit.




edit:
I was able to use both of @henfiber's methods from his answer to figure out how to lazy-load entire data.frames into a named list.

The first command here works for assigning a data.frame to a new variable name.

# this loads faithful into a variable x.
# Note we don't need to use the data() function to load faithful
> delayedAssign("x",faithful)


But I wanted to create a named list
x
with elements
y = data(faithful)
,
z=data(iris)
, etc.

I tried the below and it didn't work.

> x <- list(delayedAssign("y",faithful),delayedAssign("z", iris))
> ls()
[1] "x" "y" "z" # x is a list with 2 nulls, y & z are promises to faithful & iris


But I finally was able to construct a list of lazy-loaded data.frame objects in the following manner:

# define this function provided by henfiber
getdata <- function(...)
{
e <- new.env()
name <- data(..., envir = e)[1]
e[[name]]
}

# now create your list, this gives you one object "x" of class list
# with elements "y" and "z" which are your data.frames
x <- list(y=getdata(faithful),z=getdata(iris))

Answer

Using a helper function:

# define this function
getdata <- function(...)
{
    e <- new.env()
    name <- data(..., envir = e)[1]
    e[[name]]
}

# now load your data calling getdata()
x <- getdata("faithful")

Or using an anonymous function:

x <- (function(...)get(data(...,envir = new.env())))("faithful")

Lazy evaluation

You should also consider lazy loading your data adding LazyData: true in the DESCRIPTION file of your package.

If you use RStudio, after running data("faithful"), you'll see at the Environment panel that the "faithful" data.frame is called "promise" (another less common name is "thunk") and is greyed out. That means that it is lazily evaluated by R and not still loaded into memory. You can even lazy load the "x" variable with the delayedAssign() function:

data("faithful")              # lazy load "faithful"
delayedAssign("x", faithful)  # lazy assign "x" with a reference to "faithful"
rm(faithful)                  # remove "faithful"

Still nothing has been loaded into memory yet

summary(x)                    # now x has been loaded and evaluated

Learn more about lazy evaluation here.