Laura R. - 1 year ago 66
R Question

# How to find observations whose dummy variable changes from 1 to 0 (and not viceversa) in a df in r

I have a survey composed of n individuals; each individual is present more than one time in the survey (panel). I have a variable pens, which is a dummy that takes value 1 if the individual invests in a complementary pension form. For example:

``````df <- data.frame(year=c(2002,2002,2004,2004,2006,2008), id=c(1,2,1,2,3,3), y.b=c(1950,1943,1950,1943,1966,1966), sex=c("F", "M", "F", "M", "M", "M"), income=c(100000,55000,88000,66000,12000,24000), pens=c(0,1,1,0,1,1))

year  id  y.b   sex   income   pens
2002  1   1950   F    100000     0
2002  2   1943   M    55000      1
2004  1   1950   F    88000      1
2004  2   1943   M    66000      0
2006  3   1966   M    12000      1
2008  3   1966   M    24000      1
``````

where id is the individual, y.b is year of birth, pens is the dummy variable regarding complementary pension.

I want to know if there are individuals that invested in a complementary pension form in year t but didn't hold the complementary pension form in year t+2 (the survey is conducted every two years). In this way I want to know how many person had a complementary pension form but released it before pension or gave up (for example for economic reasons).

I tried with this command:

``````df\$x <- (ave(df\$pens, df\$id, FUN = function(x)length(unique(x)))==1)*1
which(df\$x=="0")
``````

and actually I have the individuals whose pens variable had changed during time (the command check if a variable is constant in time). For this reason I find individuals whose pens variable changed from 0 (didn't have complementary pension) in year t to 1 in year t+2 and viceversa; but I am interested in individuals whose pens variable was 1 (had a complementary pensione) in year t and 0 in year t+2.

If I use this command with the df I get that for id 1 and 2 the variable x is 0 (pens variable isn't constant), but I'd need to find a way to get just id 2 (whose pens variable changed from 1 to 0).

``````df\$x <- (ave(df\$pens, df\$id, FUN = function(x)length(unique(x)))==1)*1
which(df\$x=="0")

year id pens x
1 2002  1    0 0
2 2002  2    1 0
3 2004  1    1 0
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1
``````

(for the sake of semplicity I omitted other variables)

So the desired output is:

``````  year id pens x
1 2002  1    0 1
2 2002  2    1 0
3 2004  1    1 1
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1
``````

only id 2 has x=0 since the pens variable changed from 1 to 0.

This assigns 1 to the id's for which there is a decline in `pens` and 0 otherwise.

``````transform(d.d, x = ave(pens, id, FUN = function(x) any(diff(x) < 0)))
``````

giving:

``````  year id  y.b sex income pens   x
1 2002  1 1950   F 100000    0   0
2 2002  2 1943   M  55000    1   1
3 2004  1 1950   F  88000    1   0
4 2004  2 1943   M  66000    0   1
5 2006  3 1966   M  12000    1   0
6 2008  3 1966   M  24000    1   0
``````

This should work even even if there are more than 2 rows per id but if we knew there were always 2 rows then we could omit the `any` simplifying it to:

``````transform(d.d, x = ave(pens, id, FUN = diff) < 0)
``````

Note: The input in reproducible form is:

``````Lines <- "year  id  y.b   sex   income   pens
2002  1   1950   F    100000     0
2002  2   1943   M    55000      1
2004  1   1950   F    88000      1
2004  2   1943   M    66000      0
2006  3   1966   M    12000      1
2008  3   1966   M    24000      1"