blazej - 6 months ago 162

R Question

*This is complete reEdit of my orignal question*

Let's assume I'm working on RT data gathered in a repeated measure experiment. As part of my usual routine I always transform RT to natural logarytms and then compute a Z score for each RT **within each partipant adjusting for trial number**. This is typically done with a simple regression in SPSS syntax:

`split file by subject.`

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT rtLN

/METHOD=ENTER trial

/SAVE ZRESID.

split file off.

To reproduce same procedure in R generate data:

`#load libraries`

library(dplyr); library(magrittr)

#generate data

ob<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)

ob<-factor(ob)

trial<-c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)

rt<-c(300,305,290,315,320,320,350,355,330,365,370,370,560,565,570,575,560,570)

cond<-c("first","first","first","snd","snd","snd","first","first","first","snd","snd","snd","first","first","first","snd","snd","snd")

#Following variable is what I would get after using SPSS code

ZreSPSS<-c(0.4207,0.44871,-1.7779,0.47787,0.47958,-0.04897,0.45954,0.45487,-1.7962,0.43034,0.41075,0.0407,-0.6037,0.0113,0.61928,1.22038,-1.32533,0.07806)

sym<-data.frame(ob, trial, rt, cond, ZreSPSS)

I could apply a formula (blend of Mark's and Daniel's solution) to compute residuals from a

`lm(log(rt)~trial)`

`group_by`

`sym %<>%`

group_by (ob) %>%

mutate(z=residuals(lm(log(rt)~trial)),

obM=mean(rt), obSd=sd(rt), zRev=z*obSd+obM)

Resulting values clearly show that grouping hasn't kicked in.

Any idea why it didn't work out?

Answer

Using `dplyr`

and `magrittr`

, you should be able to calculate z-scores within individual with this code (it breaks things into the groups you tell it to, then calculates within that group).

```
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN))
```

You should then be able to do use that in your model. However, one thing that may help your shift to R thinking is that you can likely build your model directly, instead of having to make all of these columns ahead of time. For example, using `lme4`

to treat `subject`

as a random variable:

```
withRandVar <-
lmer(log(rt) ~ cond + (1|as.factor(subject))
, data = experiment)
```

Then, the residuals should already be on the correct scale. Further, if you use the z-scores, you probably *should* be plotting on that scale. I am not actually sure what running with the z-scores as the response gains you -- it seems like you would lose information about the degree of difference between the groups.

That is, if the groups are tight, but the difference between them varies by subject, a z-score may always show them as a similar number of z-scores away. Imagine, for example, that you have two subjects, one scores (1,1,1) on condition A and (3,3,3) on condition B, and a second subject that scores (1,1,1) and (5,5,5) -- both will give z-scores of (-.9,-.9,-.9) vs (.9,.9,.9) -- losing the information that the difference between A and B is larger in subject 2.

If, however, you really want to convert back, you can probably use this to store the subject means and sds, then multiply the residuals by `subjSD`

and add `subjMean`

.

```
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN)
, subjMean = mean(rtLN)
, subjSD = sd(rtLN))
```