Florian Oswald Florian Oswald - 2 months ago 5
R Question

how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?

Is there a way to prevent

data.table
to print the new data.table after assigning a new column by reference? I gather standard behaviour is

library(data.table)
example(data.table)
DT
# x y v
# 1: a 1 42
# 2: a 3 42
# 3: a 6 42
# 4: b 1 11
# 5: b 3 11
# 6: b 6 11
# 7: c 1 7
# 8: c 3 8
# 9: c 6 9

DT[,z:=1:nrow(DT)]

# x y v z
# 1: a 1 42 1
# 2: a 3 42 2
# 3: a 6 42 3
# 4: b 1 11 4
# 5: b 3 11 5
# 6: b 6 11 6
# 7: c 1 7 7
# 8: c 3 8 8
# 9: c 6 9 9


i.e. the table is printed to screen after assignment. is there a way to stop data.table from showing the new table after assigning the new column z? I know I can stop this behaviour by saying

DT <- copy(DT[,z:=1:nrow(DT)])


but that is defeating the purpose of
:=
(which is designed to avoid copies).

Answer

Since <-.data.table doesn't make a copy, you can use <-:

Create a data.table object:

library(data.table)
di <- data.table(iris)

Create a new column:

di <- di[, z:=1:nrow(di)]
di

#       Sepal.Length Sepal.Width Petal.Length Petal.Width Species  z
#  [1,]          5.1         3.5          1.4         0.2  setosa  1
#  [2,]          4.9         3.0          1.4         0.2  setosa  2
#  [3,]          4.7         3.2          1.3         0.2  setosa  3
#  [4,]          4.6         3.1          1.5         0.2  setosa  4
#  [5,]          5.0         3.6          1.4         0.2  setosa  5
#  [6,]          5.4         3.9          1.7         0.4  setosa  6
#  [7,]          4.6         3.4          1.4         0.3  setosa  7
#  [8,]          5.0         3.4          1.5         0.2  setosa  8
#  [9,]          4.4         2.9          1.4         0.2  setosa  9
# [10,]          4.9         3.1          1.5         0.1  setosa 10
# First 10 rows of 150 printed. 

It is also worth remembering that R only prints the value of an object in interactive mode.

So, in batch mode, you can simply use:

di[, z:=1:nrow(di)]

This will not produce any output when run as a script in batch mode.


Further info from Matthew Dowle:

Also see FAQ 2.21 and 2.22 :

2.21 Why does DT[i,col:=value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.

So that compound syntax can work; e.g., DT[i,done:=TRUE][,sum(done)]. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using options(datatable.verbose=TRUE).

2.22 Ok, but can't the return value of DT[i,col:=value] be returned invisibly, then?

  • We tried to but R internally forces visibility on for [. The value of FunTab's eval column (see src/main/names.c) for [ is 0 meaning force R_Visible on (see R-Internals section 1.6). Therefore, when we tried invisible() or setting R_Visible to 0 directly ourselves, eval in src/main/eval.c would force it on again.
  • After getting used to this behaviour, you might grow to prefer it (we have). After all, how many times do we subassign using <- and then immediately look at the data to check it's ok?
  • We can mix := into a j which also returns data; a mixed update and select in one query. To detect whether j solely updates (and then behave dierently) could be confusing.

Second update from Matthew Dowle:

We have now found a solution and v1.8.3 no longer prints the result when := is used. We will update FAQ 2.21 and 2.22.

Comments