beginner beginner - 1 year ago 42
R Question

Summarise data frame by finding the percentage of a particular of occurrence of a particular variable for each unique values

I want to find the percentage of sense in ore column for each unique transcript_id

transcript_id ore
A1 sense
A1 sense
A1 antisense
A2 sense
A2 antisense
A3 sense
A4 antisense
A4 antisense


expected output

transcript_id fraction
A1 0.66
A2 0.5
A3 1
A4 0

Answer Source
df %>% group_by(transcript_id) %>% summarise(fraction = sum(ore == "sense")/n())

# A tibble: 4 x 2
#  transcript_id  fraction
#         <fctr>     <dbl>
#1            A1 0.6666667
#2            A2 0.5000000
#3            A3 1.0000000
#4            A4 0.0000000

Which is equivalent to (if using mean as commented by @docendo and there are no missing values in ore):

df %>% group_by(transcript_id) %>% summarise(fraction = mean(ore == "sense"))

# A tibble: 4 x 2
#  transcript_id  fraction
#         <fctr>     <dbl>
#1            A1 0.6666667
#2            A2 0.5000000
#3            A3 1.0000000
#4            A4 0.0000000
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download