Masi Masi - 1 month ago 8
R Question

How to overload function parameters in R?

Simplified code where two parameter

age
and
gender
; however, I would like to pick cases only by
gender
or
age
; I am thinking how you can overload to
getIDs(age)
and
getIDs(gender)
without multiplicating same code again and again; assume you have 50 parameters etc; I tried
getIDs(age, "")
but I it is not a good idea

getIDs <- function(age, gender) {
# http://stackoverflow.com/a/40330110/54964

ageIDs <- c(1,2,3)
genderIDs # dummy code here to indicate that do not use genderIDs if gender ""

intersect(ageIDs, genderIDs)
}


Main data

ID,Age,Gender
100,69,male
101,75,female
102,84,female
103,,male
104,66,female


Data 2

DF <- structure(list(ID = 100:104, Age = c(69L, 75L, 84L, NA, 66L), Gender =
c("male", "female", "female", "male", "female")), .Names = c("ID", "Age",
"Gender"), row.names = c(NA, -5L), class = "data.frame")


Similarly for age: if
age==""
, do not include
subset
ageIDs` in.

Some parameter for all male would be great such that you do not need to do explicitly
"male", "male", ...
.

Algorithm based on Roman's answer



I think this strategy is very challenging with 50 parameters so better way is still needed

getIDs <- function(age, gender) {
# http://stackoverflow.com/a/40330110/54964
# So if you called this as getIDs(c(20, 30), "male")
# You'd get the ids of all males with age >= 20 and <= 30
#
# NULL = ALL
# getIDs(age = c(1,2), gender = NULL)
# getIDs(age = NULL, gender = "male")
data <- read.csv("/home/masi/data.csv",header = TRUE,sep = ",")

if (is.null(gender)) {
genderIDs <- data$ID
} else {
gender <- data$Gender == gender
genderIDs <- data[which(gender), ]$ID
}

if (is.null(age)) {
age <- c(0,130)
}
if (length(age) == 1) {
ages <- data$Age == age
} else {
ages <- (data$Age >= age[1] & data$Age <= age[2])
}
ageIDs <- data[which(ages), ]$ID

intersect(ageIDs, genderIDs)
}


OS: Debian 8.5

R: 3.1.1

Answer

Using dplyr you can write a general function where you can pass whatever condition you like to the function as a string, and it'll return the values. This scales easily to multiple parameters, as long as your condition string can be evaluated by dplyr (the outputs were generated using the dataframe you provided in this question:

library(dplyr)
getIDs <- function(conditon)
{
  data <- read.csv("/home/masi/data.csv", header = T)
  df <- data %>% filter_(conditon) %>% .$ID
}

getIDs("Gender == 'male'")
# [1] 100 103

getIDs("Age > 30")
# [1] 100 101 102 104

getIDs("Gender == 'male' & Age > 30")
# [1] 100

If you don't need to read in data within the function, the function can be written like

getIDs <- . %>% filter_(DF, .) %>% .$ID

Defining functions this way is a feature of magrittr chains.


If you want to pass a sequence of queries as arguments:

getIDs <- function(...){
    DF %>% filter_(...) %>% .$ID
} 

getIDs("Gender == 'male'", "Age > 30")
# [1] 100

If you want to get the result sorted by one of the parameters, add an arrange to the dplyr pipline:

getIDs <- function(..., by = NULL){
    DF %>% filter_(...) %>% { if (!is.null(by))  arrange_(., by) else . } %>% .$ID
} 

getIDs("Gender == 'female'", "Age > 10", by = "Age")
# [1] 104 101 102

# descending order:
getIDs("Gender == 'female'", "Age > 10", by = "desc(Age)")
# [1] 102 101 104
Comments