Maria Gold Maria Gold - 1 month ago 6
R Question

Tidy multivariate data in R

I have a dataset with the following structure:

example of dataset

the rows are participants in an experiment, and the columns are questions they answered. All the columns titled EC belong to one type of task, all those titled ART belong to another etc.

After reading the table into R, how do I tidy the data such that all questions belonging to one type of task are saved as a single variable? I basically want each type of task (all answers that all participants gave for that task) to be saved as separate variables which I can later do statistical analysis on.

I understand that gather and separate might be useful commands for this, but I don't completely understand how to use them here and I don't completely understand their syntax.

For example:

gather(data,key, value) - I think that 'key' should refer to the title I gave the variable? and the 'value' refer to the fields where the values related with that variable are locate? If so, what does 'data' refer to? I tried putting the name of the table in the 'data' field, but got an error saying 'Error: Invalid column specification'.

Can someone help?

Answer

There has to be a dup for this but if we simulate some data:

library(tidyr)
library(purrr)
library(dplyr)

This part just re-creates a data set like you seem to have. It's not necessary to understand this for the solution.

df <- map(1:16, ~sample(0:4, 10, replace=TRUE))
df <- as.data.frame(df)
df <- set_names(df, c(sprintf("EC%d", 1:4), sprintf("ART%d", 1:4), sprintf("IC%d", 1:4), sprintf("AQ%d", 1:4)))
df <- mutate(participant=sprintf("id%d", 10)) 

Here's what df ends up looking like:

df
##    EC1 EC2 EC3 EC4 ART1 ART2 ART3 ART4 IC1 IC2 IC3 IC4 AQ1 AQ2 AQ3 AQ4 participant
## 1    4   2   1   4    2    2    3    1   4   2   0   4   3   0   4   2        id10
## 2    3   4   1   0    1    1    1    2   3   4   0   4   2   1   4   3        id10
## 3    4   2   3   2    0    1    3    4   4   1   2   4   0   1   0   4        id10
## 4    1   4   0   3    2    3    1    2   0   2   1   1   1   3   3   1        id10
## 5    2   3   1   1    2    4    1    0   3   0   3   3   0   1   4   2        id10
## 6    4   0   1   1    1    4    2    0   3   0   1   3   3   3   2   0        id10
## 7    3   1   1   1    4    1    1    0   0   2   1   4   3   2   2   3        id10
## 8    0   4   0   1    4    4    2    4   0   1   1   3   1   1   4   0        id10
## 9    0   0   4   4    0    1    0    3   1   0   2   3   4   4   1   0        id10
## 10   2   0   2   1    4    2    3    4   3   4   4   4   3   0   4   4        id10

That seems to be in the format your data is.

If so, then, I think this is what you want:

df <- gather(df, answer, value, -participant)

head(df, 20)
##    participant answer value
## 1         id10    EC1     4
## 2         id10    EC1     3
## 3         id10    EC1     4
## 4         id10    EC1     1
## 5         id10    EC1     2
## 6         id10    EC1     4
## 7         id10    EC1     3
## 8         id10    EC1     0
## 9         id10    EC1     0
## 10        id10    EC1     2
## 11        id10    EC2     2
## 12        id10    EC2     4
## 13        id10    EC2     2
## 14        id10    EC2     4
## 15        id10    EC2     3
## 16        id10    EC2     0
## 17        id10    EC2     1
## 18        id10    EC2     4
## 19        id10    EC2     0
## 20        id10    EC2     0

You may or may not have an ID variable for the subject, but we don't know that since we really don't have your data.