Phil D. Phil D. - 1 year ago 54
R Question

Calculating growth of individual recaptures in R

I am trying to calculate growth for individuals from a large capture/mark/recapture dataset. I have data for every marked individual, but not all of them were recaptured. Basically, I capture and mark individuals of a certain size. A certain time later, I capture and mark individuals of a larger size. Some of these are already marked recaptures, while others are new individuals.

What I would like to do is take this data set, then calcualte growth for the recaptured individuals. At it's base, this is fairly simple. A recaptured individual will have the same tag number each time. So, I simply have to subtract the total length (TL) when the individual was first marked from the TL at recapture. However, I do not know how to tell R to select a previous row based on a certain column, then apply a function to it. The tricky part to me is that the values I am trying to match are in the same column. I have looked at ?%in% and tried to see if plyr has anything that would work, but could not find anything.

EDIT: longer data frame to handle multiple (more than 2) recaptures of the same individual. In that case, the growth from the original capture should be calculated.

I created a sample data frame using the following:

tlran=c(rnorm(5,mean=50,sd=5),rnorm(5,mean=200,sd=5), rnorm(5,mean=400,sd=10))
TaggingData=as.data.frame(trunc(tlran,0))
names(TaggingData)="TL"
TaggingData$Tag=c(1,2,3,4,5,6,1,7,3,4,8,1,9,10,3)
TaggingData$Date=c("01.01.2015","01.01.2015","01.01.2015","01.01.2015","01.01.2015","01.01.2016","01.01.2016","01.01.2016","01.01.2016","01.01.2016","01.01.2017","01.01.2017","01.01.2017","01.01.2017","01.01.2017")


So my sample data looks like this:

>TaggingData
TL Tag Date
1 36 1 01.01.2015
2 44 2 01.01.2015
3 51 3 01.01.2015
4 49 4 01.01.2015
5 50 5 01.01.2015
6 203 6 01.01.2016
7 198 1 01.01.2016
8 203 7 01.01.2016
9 193 3 01.01.2016
10 210 4 01.01.2016
11 403 8 01.01.2017
12 402 1 01.01.2017
13 393 9 01.01.2017
14 391 10 01.01.2017
15 415 3 01.01.2017


EDIT: Manually, what I would like my data to look like:

>TaggingData
TL Tag Date Growth
1 36 1 01.01.2015 NA
2 44 2 01.01.2015 NA
3 51 3 01.01.2015 NA
4 49 4 01.01.2015 NA
5 50 5 01.01.2015 NA
6 203 6 01.01.2016 NA
7 198 1 01.01.2016 162
8 203 7 01.01.2016 NA
9 193 3 01.01.2016 142
10 210 4 01.01.2016 161
11 403 8 01.01.2017 NA
12 402 1 01.01.2017 366
13 393 9 01.01.2017 NA
14 391 10 01.01.2017 NA
15 415 3 01.01.2017 364


Basically, I want R to realize that because the tag number in row 7 is identical to the one in row 1, TL for row 7 should be subtracted from TL in row 1 and ideally written in a new column, for example TaggingData$Growth.

I am unfortunately completely lost here. I can select duplicate tag numbers using duplicates(TaggingData$Tag), but I feel this does not help me.

Can anyone suggest a package/functions to look at, or even help me out with some code?

EDIT: I have tried the suggested aggregate, and also package data.table with the following code:

Recap=data.table(TaggingData)
setkey(Recap,Tag)
Recap[,diff:=c(NA,diff(TL)),by=Tag]


While this gives me the numbers I am looking for, the formatting is not at all what I was hoping for and would require quite a bit of work to clean it up to what I need to continue working with the data.

Answer Source

Based on your result it seems that you want the cumsum of the diff. To follow your data.table approach, then

library(data.table)
Recap=data.table(TaggingData)
setkey(Recap,Tag)
Recap[,diff:=c(NA,cumsum(diff(TL))),by=Tag] 

P.S. I am not displaying results as you did not set.seed in your example and thus the results will not match yours.