Laura R. Laura R. - 1 month ago 6
R Question

How to find observations whose dummy variable changes from 1 to 0 (and not viceversa) in a df in r

I have a survey composed of n individuals; each individual is present more than one time in the survey (panel). I have a variable pens, which is a dummy that takes value 1 if the individual invests in a complementary pension form. For example:

df <- data.frame(year=c(2002,2002,2004,2004,2006,2008), id=c(1,2,1,2,3,3), y.b=c(1950,1943,1950,1943,1966,1966), sex=c("F", "M", "F", "M", "M", "M"), income=c(100000,55000,88000,66000,12000,24000), pens=c(0,1,1,0,1,1))

year id y.b sex income pens
2002 1 1950 F 100000 0
2002 2 1943 M 55000 1
2004 1 1950 F 88000 1
2004 2 1943 M 66000 0
2006 3 1966 M 12000 1
2008 3 1966 M 24000 1


where id is the individual, y.b is year of birth, pens is the dummy variable regarding complementary pension.

I want to know if there are individuals that invested in a complementary pension form in year t but didn't hold the complementary pension form in year t+2 (the survey is conducted every two years). In this way I want to know how many person had a complementary pension form but released it before pension or gave up (for example for economic reasons).

I tried with this command:

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")


and actually I have the individuals whose pens variable had changed during time (the command check if a variable is constant in time). For this reason I find individuals whose pens variable changed from 0 (didn't have complementary pension) in year t to 1 in year t+2 and viceversa; but I am interested in individuals whose pens variable was 1 (had a complementary pensione) in year t and 0 in year t+2.

If I use this command with the df I get that for id 1 and 2 the variable x is 0 (pens variable isn't constant), but I'd need to find a way to get just id 2 (whose pens variable changed from 1 to 0).

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

year id pens x
1 2002 1 0 0
2 2002 2 1 0
3 2004 1 1 0
4 2004 2 0 0
5 2006 3 1 1
6 2008 3 1 1


(for the sake of semplicity I omitted other variables)

So the desired output is:

year id pens x
1 2002 1 0 1
2 2002 2 1 0
3 2004 1 1 1
4 2004 2 0 0
5 2006 3 1 1
6 2008 3 1 1


only id 2 has x=0 since the pens variable changed from 1 to 0.

Thanks in advance

Answer

This assigns 1 to the id's for which there is a decline in pens and 0 otherwise.

transform(d.d, x = ave(pens, id, FUN = function(x) any(diff(x) < 0)))

giving:

  year id  y.b sex income pens   x
1 2002  1 1950   F 100000    0   0
2 2002  2 1943   M  55000    1   1
3 2004  1 1950   F  88000    1   0
4 2004  2 1943   M  66000    0   1
5 2006  3 1966   M  12000    1   0
6 2008  3 1966   M  24000    1   0

This should work even even if there are more than 2 rows per id but if we knew there were always 2 rows then we could omit the any simplifying it to:

transform(d.d, x = ave(pens, id, FUN = diff) < 0)

Note: The input in reproducible form is:

Lines <- "year  id  y.b   sex   income   pens   
2002  1   1950   F    100000     0     
2002  2   1943   M    55000      1    
2004  1   1950   F    88000      1    
2004  2   1943   M    66000      0    
2006  3   1966   M    12000      1    
2008  3   1966   M    24000      1"

d.d <- read.table(text = Lines, header = TRUE, check.names = FALSE)
Comments