jayinbluecity jayinbluecity - 1 month ago 13
R Question

How to count the variables depending on the appearance of another value in the same column?

I want to count the number of death before first breast cancer, after first breast cancer, after second breast cancer etc...

my data looks like this, when you see the EVENT column you can see some people has death event before BC1(first breast cancer), after BC1 or after BC2. I was wondering how to count the number of people in this each sequence

PERSON_ID EVENT
3 10000000002 BC1
4 10000000002 R_B
5 10000000002 BC2
6 10000000002 DEATH
7 10000000002 EPI
20 10000000007 BC1
21 10000000007 BC2
22 10000000007 DEATH
23 10000000010 DEATH
37 10000000827 DEATH
38 10000000830 BC1
39 10000000830 DEATH


I want some tables like this, but I dont worry about making tables now.
I just want the corresponding count for

Death before BC1: #
Death after BC1: #
Death after BC2: #


sorry for bad formatting in the post, any help will be appreciated!

Answer

Here is a very simple solution. The results are stored into the results variable.

my_data<- data.frame(PERSON_ID = as.character(c(10000000002,10000000002,10000000002,10000000002,10000000002,10000000007,10000000007,10000000007,10000000010,10000000827,10000000830,10000000830)),
    EVENT  = c("BC1","R_B","BC2","DEATH","EPI","BC1","BC2","DEATH","DEATH","DEATH","BC1","DEATH"))

my_function <- function(ID){
    person <- subset(my_data, PERSON_ID == ID)
    a <- which(person $EVENT == "DEATH")
    b <- which(person $EVENT == "BC1")
    c <- which(person $EVENT == "BC2")
    if(length(b) == 0){return("Death_before_BC1")}
    else if(length(c) == 0){return("Death_after_BC1")}
    else{return("Death_after_BC2")}
    }

results_tmp <- sapply(as.character(unique(my_data$PERSON_ID)), my_function)

results <- data.frame(Death_before_BC1 = sum(results_tmp == "Death_before_BC1"), 
    Death_after_BC1 = sum(results_tmp == "Death_after_BC1"),
    Death_after_BC2 = sum(results_tmp == "Death_after_BC2"))