Ben K. Ben K. - 9 days ago 6
R Question

matching id inside data frame

I made this simple data frame to make my question more clear:

id = c(11, 12, 13, 14, 15)
referenceperson = c("yes", "no", "yes", "no", "yes")
smoke = c(3, 4, 3, NA, 2)
spouseid = c(12, 11, NA, 15, 14)
dataframe = data.frame(id, referenceperson , smoke, spouseid)


I would like to get the the amount of smoking of the spouse of a reference person only, in this example value 4 of the first observation.

I'm lost here and thanks for any help

42- 42-
Answer

Using only the values in your dataframe object, will step though it and present a compact method of getting the single value you ask for and then all the values:

> dataframe[ match(dataframe$spouseid[1], data.frame$id) , 'smoke']
[1] 4

That was the method of getting the index of the spouse of the person in the first and using it to get the 'smoke' value in the referenced row. The next line demonstrates that match will get you all such indices and where they don't exist will return an NA.

> match(dataframe$spouseid, dataframe$id)
[1]  2  1 NA  5  4

In R using NA as an index into a dataframe will return an NA, rather than a null value. This preserves sequence information. Therefore, you can get all the smoking values of spouses with this:

> dataframe[ match(dataframe$spouseid, dataframe$id) , 'smoke']
[1]  4  3 NA  2 NA

And then assign those values to a column in the dataframe.

> dataframe$smk_stat_spouse <- 
                    dataframe[ match(dataframe$spouseid, dataframe$id) , 'smoke']
> dataframe
  id referenceperson smoke spouseid smk_stat_spouse
1 11             yes     3       12               4
2 12              no     4       11               3
3 13             yes     3       NA              NA
4 14              no    NA       15               2
5 15             yes     2       14              NA