user3017075 user3017075 - 2 months ago 16
R Question

R data table - create a new column where each element is a list of values

I've started working with R data.table and I'm trying to do the following:
For simplicity, let's say that I have a list of ArticleReadings as follows:

UserID Time ArticleID Category NumOfReading
'aaa' 7:50 'x' 'sports' 1
'bbb' 5:05 'x' 'sports' 1
'aaa' 8:40 'y' 'politics' 2
'aaa' 10:00 'z' 'sports' 3


Eventually I would want a new column that will contain a list of all the categories read by a specific user. In this example, the value for user 'aaa' will be a vector of 'politics','sports', and for user 'bbb' it will be a vector with one element: 'sports'.
I want this type of column because later on I would want to have some manipulations on it (e.g. compute the Mode/dominant category, or display the popular combinations of categories), so I thought to first get a unique vector for each user, then sort it.
All my trials to have such functions as the new value of the column resulted in setting the vector values seperately for each element, and not a vector as the column value....
for example, one of my trials:

CategoriesList <- function(x){sort(unique(x))}
DT[,':='(UniqueCats=CategoriesList(Category)),by=userID]


As I'm new to data.table and to user defined functions in R, I guess that I'm missing some critical point regarding transferring the result to a vector...
Any help would be appreciated!

Answer

If we need a list column in the dataset, wrap it with list

DT[, UniqueCats := list(list(sort(unique(Category)))) , by = UserID]
str(DT)
#Classes ‘data.table’ and 'data.frame':  4 obs. of  6 variables:
# $ UserID      : chr  "aaa" "bbb" "aaa" "aaa"
# $ Time        : chr  "7:50" "5:05" "8:40" "10:00"
# $ ArticleID   : chr  "x" "x" "y" "z"
# $ Category    : chr  "sports" "sports" "politics" "sports"
# $ NumOfReading: int  1 1 2 3
# $ UniqueCats  :List of 4
#  ..$ : chr  "politics" "sports"
#  ..$ : chr "sports"
#  ..$ : chr  "politics" "sports"
#  ..$ : chr  "politics" "sports"

We can also create a string column by concatenating the elements together with paste

DT[, uniqueCats := toString(sort(unique(Category))), by = UserID]
Comments