user3747260 user3747260 - 2 months ago 17
R Question

environment not behaving as expected after using transformEnvir in RevoScaleR function

I have a function where I'm reading an xdf file using rxXdfToDataFrame and using a variable in my expression for rowSelection. If I don't pass

transformEnvir=environment()
, the variable is not found. My problem is that after calling the function with
transformEnvir
, I can't seem to reliably access
.GlobalEnv
. If I hardcode a number into
rowSelection
I don't need to use
transformEnvir
and everything works correctly. I tried setting the environment, but I'm not sure I was even doing it correctly.

The following code reproduces my problem:

envirtest = function()
{
require(data.table)
df = data.frame(x=1:10)
selectnum = 5
rxDataFrameToXdf(df, "testxdf.xdf")
testdf = rxXdfToDataFrame("testxdf.xdf",rowSelection=(x==selectnum),transformEnvir=environment())
testdt = setDT(testdf)
}


The error that occurs:

Error in envirtest() : could not find function "setDT"


However, if instead of
setDT()
,
data.table::setDT()
is used, then the function executes.

edit: I forgot to mention that I had tried it without
transformEnvir
set and everything worked properly. Also, tables() was changed to setDT() to avoid possible confusion.

Answer

Here is a solution to your problem, together with a partial explanation:

  • At the completion of the transformation, the transformation environment gets cleared.
  • This means it is safer to create an environment and then adding any objects into this environment before starting the rx-function.

Concretely:

env <- new.env()
env$selectnum = 5

Set up your function like this:

envirtest = function()
{
  require(data.table)
  df = data.frame(x=1:10)
  env <- new.env()
  env$selectnum = 5

  rxDataFrameToXdf(df, "testxdf.xdf", overwrite=TRUE)
  testdf <- rxXdfToDataFrame("testxdf.xdf",
                             rowSelection=(x==selectnum),
                             transformEnvir=env
  )
  setDT(testdf)
}

Now try it:

x <- envirtest()

Rows Read: 10, Total Rows Processed: 10, Total Chunk Time: 0.006 seconds 
Rows Processed: 1
Time to read data file: 0.00 secs.
Time to convert to data frame: less than .001 secs.

str(x)

Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
 $ x: int 5
 - attr(*, ".internal.selfref")=<externalptr>