daRknight - 1 year ago 64
R Question

# Create new variable with reference from another data.table

I know this may be accomplished with a for-loop, but am certain there is a more elegant solution within the construct of

`data.table`
.

I have two data tables, and will use 'iris' to illustrate my issue:

``````library("data.table")
A <- as.data.table(iris)                      #primary data table
B <- A[Sepal.Width > 3, .N, by = Species]     #count from A meeting condition

#       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1:          5.1         3.5          1.4         0.2     setosa
#2:          4.9         3.0          1.4         0.2     setosa
#3:          4.7         3.2          1.3         0.2     setosa

B
#      Species  N
#1:     setosa 42
#2: versicolor  8
#3:  virginica 17
``````

I would like to add a new variable to B which is simply the proportion of the data set that B represents, i.e. for the first row the output would be something like:

``````B[, Proportion := N/nrow(A[Species == "setosa"])]
``````

The RHS of that index would obviously need to be dynamic, referencing the value of the first column in B by row..

It is this iteration that eludes me (though I feel it has to do with the data table key(s) perhaps?); greatly appreciate any help!

I would approach this as follows:

``````A <- as.data.table(iris)
B <- A[Sepal.Width > 3, .N, by = .("spec" = Species)]

B[, Proportion := N/nrow(A[Species == spec]), by = spec]
``````

which results in:

``````> B
spec  N Proportion
1:     setosa 42       0.84
2: versicolor  8       0.16
3:  virginica 17       0.34
``````

Explanation:

• By renaming the `Species` column to `spec`, you prevent R & data.table from not knowing which column to take for the calculation of the `Proportion`.
• Using `by = spec` takes care of that the correct `spec` is used in `A[Species == spec]`.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download