user3692048 user3692048 - 3 months ago 11
R Question

R - Find max frequency and replace value without using loop

I have a physician claims dataset where physicians are able to submit claims under different specialties. I want to find the most frequently used specialty submitted by each physician and replace all specialty values with their most commonly used specialty.

physician <- c("Mary","Mary","Mary","Mary","Mary","Bob","Bob","Bob")
specialty <- c("GP","PED","DERM","ANES","GP","DERM","GP","DERM")
data <- as.data.frame(cbind(physician,specialty))

data
physician specialty
Mary GP
Mary PED
Mary DERM
Mary ANES
Mary GP
Bob DERM
Bob GP
Bob DERM


I am looking for a script that will output the following without using a
for loop
:

data
physician specialty
Mary GP
Mary GP
Mary GP
Mary GP
Mary GP
Bob DERM
Bob DERM
Bob DERM


The actual data.frame itself has a lot more columns and physicians.

AEF AEF
Answer

You can make use of tapply. It groups the data and applies a function to each group.

physician_max <- tapply(data$specialty, data$physician,
                        function(s) {
                            counts <- table(s)
                            names(counts)[which.max(counts)]
                        })
data$specialty <- physician_max[data$physician]