DeltaIV DeltaIV - 2 months ago 10
R Question

Tidy up a dataframe where each column contains more than one variable

I have 4 sensors,

T1, T2, T3, T4
, acquiring data at regular intervals in time. Each sensor is located at a different radial and angular position. These data are stored like this:

test_data <- data.frame(T1=c(0,0,rnorm(3),NA,rnorm(6)), T2=c(1,0,NA,NA,rnorm(8)), T3=c(0,2*pi,rnorm(9),NA), T4=c(1,2*pi,rnorm(1),NA,rnorm(8)))


i.e., the first two rows of column containing the observations for sensor i contain respectively the radius and the angular position of the sensor. The following rows store the measures acquired from the sensor. Some measurements may be missing,

I would like to have a data frame in tidy format, thus each column should contain only 1 variable. This means that radius and angle must get their own columns. Consequently, the sensor measurements cannot be stored anymore in parallel columns, but must be stored serially. The second step is easy:

library(tidyr)
test_data <- test_data %>% gather(Sensor, Temperature)


However, I'm not sure which is the most idiomatic way to perform the first step. Of course I could use a
for
loop, but I was wondering if there could be a more idiomatic way.

Answer

Maybe something like this:

angle <- test_data %>%
  slice(1:2) %>%
  t %>%
  data.frame %>%
  setNames(., c("Radial", "Angular")) %>%
  tibble::rownames_to_column("Sensor")

test_data %>%
  slice(3:nrow(.)) %>%
  gather(Sensor, Temperature) %>%
  left_join(angle)

Which gives:

#        Sensor Temperature Radial  Angular
#1           T1 -0.10555991      0 0.000000
#2           T1 -0.73050967      0 0.000000
#3           T1  1.90504358      0 0.000000
#4           T1          NA      0 0.000000
#5           T1  0.33262173      0 0.000000
#6           T1  0.23063364      0 0.000000
#7           T1 -1.69186241      0 0.000000
#8           T1  0.65979190      0 0.000000
#9           T1 -1.02362359      0 0.000000
#10          T1 -0.89152157      0 0.000000
#11          T2          NA      1 0.000000
#12          T2          NA      1 0.000000
#13          T2  0.91834117      1 0.000000
#14          T2 -0.45270065      1 0.000000
#15          T2 -1.74837228      1 0.000000
#16          T2  1.76990411      1 0.000000
#17          T2 -2.37740693      1 0.000000
#18          T2  0.57281153      1 0.000000
#19          T2  1.01724925      1 0.000000
#20          T2 -0.63096787      1 0.000000
#21          T3  0.44428705      0 6.283185
#22          T3  0.43913039      0 6.283185
#23          T3  1.04062315      0 6.283185
#24          T3  0.48409939      0 6.283185
#25          T3 -0.24488378      0 6.283185
#26          T3  0.91599206      0 6.283185
#27          T3  0.80062236      0 6.283185
#28          T3 -0.93656903      0 6.283185
#29          T3 -1.40078743      0 6.283185
#30          T3          NA      0 6.283185
#31          T4  0.16027754      1 6.283185
#32          T4          NA      1 6.283185
#33          T4 -0.27396237      1 6.283185
#34          T4 -0.98553911      1 6.283185
#35          T4  0.08393068      1 6.283185
#36          T4 -1.31999653      1 6.283185
#37          T4  0.16122635      1 6.283185
#38          T4 -0.62492839      1 6.283185
#39          T4  0.95716427      1 6.283185
#40          T4  2.42448914      1 6.283185