Phil D. Phil D. - 12 days ago 5
R Question

Calculating growth of individual recaptures in R

I am trying to calculate growth for individuals from a large capture/mark/recapture dataset. I have data for every marked individual, but not all of them were recaptured. Basically, I capture and mark individuals of a certain size. A certain time later, I capture and mark individuals of a larger size. Some of these are already marked recaptures, while others are new individuals.

What I would like to do is take this data set, then calcualte growth for the recaptured individuals. At it's base, this is fairly simple. A recaptured individual will have the same tag number each time. So, I simply have to subtract the total length (TL) when the individual was first marked from the TL at recapture. However, I do not know how to tell R to select a previous row based on a certain column, then apply a function to it. The tricky part to me is that the values I am trying to match are in the same column. I have looked at ?%in% and tried to see if plyr has anything that would work, but could not find anything.

EDIT: longer data frame to handle multiple (more than 2) recaptures of the same individual. In that case, the growth from the original capture should be calculated.

I created a sample data frame using the following:

tlran=c(rnorm(5,mean=50,sd=5),rnorm(5,mean=200,sd=5), rnorm(5,mean=400,sd=10))
TaggingData=as.data.frame(trunc(tlran,0))
names(TaggingData)="TL"
TaggingData$Tag=c(1,2,3,4,5,6,1,7,3,4,8,1,9,10,3)
TaggingData$Date=c("01.01.2015","01.01.2015","01.01.2015","01.01.2015","01.01.2015","01.01.2016","01.01.2016","01.01.2016","01.01.2016","01.01.2016","01.01.2017","01.01.2017","01.01.2017","01.01.2017","01.01.2017")


So my sample data looks like this:

>TaggingData
TL Tag Date
1 36 1 01.01.2015
2 44 2 01.01.2015
3 51 3 01.01.2015
4 49 4 01.01.2015
5 50 5 01.01.2015
6 203 6 01.01.2016
7 198 1 01.01.2016
8 203 7 01.01.2016
9 193 3 01.01.2016
10 210 4 01.01.2016
11 403 8 01.01.2017
12 402 1 01.01.2017
13 393 9 01.01.2017
14 391 10 01.01.2017
15 415 3 01.01.2017


EDIT: Manually, what I would like my data to look like:

>TaggingData
TL Tag Date Growth
1 36 1 01.01.2015 NA
2 44 2 01.01.2015 NA
3 51 3 01.01.2015 NA
4 49 4 01.01.2015 NA
5 50 5 01.01.2015 NA
6 203 6 01.01.2016 NA
7 198 1 01.01.2016 162
8 203 7 01.01.2016 NA
9 193 3 01.01.2016 142
10 210 4 01.01.2016 161
11 403 8 01.01.2017 NA
12 402 1 01.01.2017 366
13 393 9 01.01.2017 NA
14 391 10 01.01.2017 NA
15 415 3 01.01.2017 364


Basically, I want R to realize that because the tag number in row 7 is identical to the one in row 1, TL for row 7 should be subtracted from TL in row 1 and ideally written in a new column, for example TaggingData$Growth.

I am unfortunately completely lost here. I can select duplicate tag numbers using duplicates(TaggingData$Tag), but I feel this does not help me.

Can anyone suggest a package/functions to look at, or even help me out with some code?

EDIT: I have tried the suggested aggregate, and also package data.table with the following code:

Recap=data.table(TaggingData)
setkey(Recap,Tag)
Recap[,diff:=c(NA,diff(TL)),by=Tag]


While this gives me the numbers I am looking for, the formatting is not at all what I was hoping for and would require quite a bit of work to clean it up to what I need to continue working with the data.

Answer

Based on your result it seems that you want the cumsum of the diff. To follow your data.table approach, then

library(data.table)
Recap=data.table(TaggingData)
setkey(Recap,Tag)
Recap[,diff:=c(NA,cumsum(diff(TL))),by=Tag] 

P.S. I am not displaying results as you did not set.seed in your example and thus the results will not match yours.

Comments