Tal Galili Tal Galili - 1 month ago 7
R Question

Where is `ecdf` saving its object? (and how to measure it?)

I can't seem to understand where R saves the data for

ecdf
. Here is some code to illustrate this:

> set.seed(2016-10-30)
> x <- rnorm(1e4)
> y <- ecdf(x)
> object.size(x)
80040 bytes
> object.size(y)
3896 bytes
> rm(x)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 602079 32.2 1168576 62.5 750400 40.1
Vcells 1183188 9.1 299644732 2286.2 750532746 5726.2
> object.size(y)
3896 bytes
> plot(y) # still works...
>


If the size of y is small, it means the data is saved somewhere. It is obviously not saved in x (as I removed it).


  1. It is probably in some environment, but how would we access it? So where is this data saved, and how can it be accessed?

  2. How would this effect memory.limit() ? (i.e.: caching or memory limits of running R processes)


Answer

There is a fantastic explanation of function closures, the enclosing, executing and calling environments in @hadley's Advanced R.

For your specific example, as noted in the comments, the size of the object, together with its enclosing environment is much larger:

pryr::compare_size(y)

You can see the objects that this entails, and their relative sizes using this:

sapply(codetools::findGlobals(y), function(x) object.size(get(x, environment(y))))

You can sum the last vector to see that this is indeed what pryr::object_size is reporting (164 kB on my machine).