I have a table of (x,y) points and would like to create a second table that summarizes those points.
I would like each row in the summary table to show the sum of all the y's where x is greater than a sequence of thresholds. But I'm having trouble figuring out how to join the threshold value of the row into the inner sum.
I've gotten this far:
samples <- data.table(x=seq(1,100,1), y=seq(1,100,1))
thresholds = seq(10,100,10)
thresholdedSums <- data.table(xThreshold=thresholds, ySumWhereXGreaterThanThreshold=sum(samples[x > xThreshold, y]))
Error in eval(expr, envir, enclos) : object 'xThreshold' not found
(row 1) threshold = 10, ySumWhereXGreaterThanThreshold = sum of all y values in samples where x > 10,
(row 2) threshold = 20, ySumWhereXGreaterThanThreshold = sum of all y values in samples where x > 20,
... etc ...
The result can be given by the following code. This solution is not completely based on data.table but works robustly.
thresholdedSums <- data.table( thres = thresholds, Sum = sapply(thresholds, function(thres) samples[x > thres, sum(y)]) ) # thres Sum # 1: 10 4995 # 2: 20 4840 # 3: 30 4585 # 4: 40 4230 # 5: 50 3775 # 6: 60 3220 # 7: 70 2565 # 8: 80 1810 # 9: 90 955 # 10: 100 0
sapply(thresholds, function(thres) samples[x > thres, sum(y)]) returns a vector of the same length as
thresholds. You can read it as: For every element in
thresholds execute the function
function(thres) samples[x > thres, sum(y)] and return the result as a
vector. In comparison to a
for-loop this procedure is normally better in performance and easier to read.