Maria Malidaki Maria Malidaki - 1 month ago 4x
R Question

Running a script in R for several individuals in a study

I'm a Master's student in animal behavior, very beginner in R. I'm doing statistics for my thesis and decided to give R a try before I resort to PSPP, after a small introductory course I had on the language, using R Studio.

I wrote a very basic script that

  1. Imports data from 51 individuals from a csv. file, and also imports a csv. file to store results in.

  2. Creates a subset with data from one individual and then, depending on the data, makes a hypothesis, runs a binomial test and decides whether the hypothesis is rejected or not. The hypothesis, p-value of the binomial test and rejection/acceptance of hypothesis per individual are stored in the results file.

The script is fully functional when I run it for each individual. I apologize in advance for the bad quality of the code, but here's the script for one individual, if you'd like to take a look:

## Import data and results file
odor_behavior <- read.csv("odor_behavior.csv")
percalf_binomial_odor_results <- read.csv("percalf_binomial_odor_results.csv")

# Create binomial dataset per calf (example here: 958)
pref958 <- subset(odor_behavior, Calf_ID == 958, c("Preference"), drop=FALSE)

# Count the number of testing sessions for this calf
n958 <- nrow(pref958)

# Calculate how many times a suckling behavior appears (Control=1 or Modified=3).
mat958 <- as.matrix(pref958)
fmat958 <- factor(mat958,levels=c(1,3),ordered=TRUE)
fpref958 <- table(fmat958)
data958 <-

# Hypothesize a preference according to the observations and save it.
# Also, save variables for the binomial test.
## Find the row in which to store the results
row958<- which(percalf_binomial_odor_results$Calf_ID == 958)
## Decide the hypothesis.
if (data958[1,2] > data958[2,2]) {
x958 <- data958[1,2]
percalf_binomial_odor_results[row958,c("H_preference")] <- "Control"
} else if (data958[1,2] < data958[2,2]) {
x958 <- data958[2,2]
percalf_binomial_odor_results[row958,c("H_preference")] <- "Odor"
} else if (data958[1,2] == data958[2,2]) {
x958 <- data958[1,2]
percalf_binomial_odor_results[row958,c("H_preference")] <- "Either"
} else {print("Check your data, Maria!")

# Run binomial test
binom.test (x=x958, n=n958, p=0.5, conf.level = 0.95)

# Save the p.value and decide whether the hypothesized preference is real or not
pbin958 <- binom.test (x=x958, n=n958, p=0.5, conf.level = 0.95)$p.value
percalf_binomial_odor_results[row958,c("Bin_odor_pvalue")] <- pbin958
if (pbin958 <0.05) {
percalf_binomial_odor_results[row958,c("Real_preference")] <- "Yes"
} else {
percalf_binomial_odor_results[row958,c("Real_preference")] <- "No"

As mentioned, it all runs smoothly when I repeat the script for each individual, by copy-pasting the code and manually changing the "Calf_ID" fifty times.

The question: Is it possible for the script to run by itself for each individual? I have tried doing it by placing all 51 "Calf_ID"s in a matrix, and putting the script in a function that uses the elements from the matrix to go through each individual.

Lapply returns correct results, but only with regard to whether the hypothesis was accepted/rejected (ie last part of the script), which has me thinking that the script indeed runs for all 51 individuals. However, no results get stored in the respective file during the process (highly likely because I am doing something wrong in the first place). When I do the manual copy-paste/edit procedure, all results (hypothesis, p-value and result for hypothesis per individual) are correctly stored in the results file.

I would be thankful for any suggestions and thoughts. I can make the data files available, if necessary. This is not an urgent question, as I'm able to get my results by the copy-pasting method. It's simply a question of efficiency, cause right now I have 51 individuals, but what does one do when they have one thousand? It would be great to know how to make it work!


As requested in the comments, here are the results (after running the script for individual 958) to


structure(list(Calf_ID = c(958L, 958L, 958L, 958L, 958L, 958L), Preference = c(1L, 1L, 3L, 1L, 1L, 3L)), .Names = c("Calf_ID", "Preference"), row.names = c(NA, 6L), class = "data.frame")

and dput(head(percalf_binomial_odor_results)):

structure(list(Calf_ID = c(958L, 7015L, 7017L, 959L, 7018L, 7019L),
H_preference = c("Control", NA, NA, NA, NA, NA), Bin_odor_pvalue = c(0.453125, NA, NA, NA, NA, NA), Real_preference = c("No", NA, NA, NA, NA, NA)), .Names = c("Calf_ID", "H_preference", "Bin_odor_pvalue", "Real_preference"), row.names = c(NA, 6L), class = "data.frame")


It seems that you apply computations to "Preference" grouped by each "Calf_ID". Instead of subsetting odor_behavior$Preference repeteadly for each id, we could utilize R's group-by functions after, first, building a function that performs all work given appropriate input.

I tried to simplify your script in a function -- hopefully I've not missed hard to accomodate details:

ff = function(pref)
    fx = factor(pref, levels = c(1L, 3L), labels = c("Control", "Odor"))
    tab = table(fx)

    p = binom.test(max(tab), length(pref), p = 0.5, conf.level = 0.95)$p.value

    real_pref = if(p < 0.05) "yes" else "no"
    H_pref = if(tab["Control"] == tab["Odor"]) "Either" else names(which.max(tab))

    data.frame(p = p, real_pref = real_pref, H_pref = H_pref, stringsAsFactors = FALSE)

And, then, apply it for all "Calf_ID" individuals:, aggregate(pref ~ ID, dat, ff))
#   ID     pref.p pref.real_pref pref.H_pref
#1   1 0.07295139             no     Control
#2   2  0.1689778             no     Control
#3   3  0.8919232             no        Odor
#4   4  0.6029232             no        Odor
#5   5          1             no     Control
#6   6  0.7659918             no     Control
#7   7          1             no        Odor
#8   8   0.889884             no     Control
#9   9          1             no      Either
#10 10  0.5758493             no     Control

where dat is:

dat = data.frame(ID = sample(10, 500, TRUE), pref = sample(c(1L, 3L), 500, TRUE))