ALG ALG - 2 months ago 6
R Question

How can I move pairs of columns into rows and create an ID at the same time?

I have a bunch of raw data obtained from a software with not many optimization options.
When I convert it to .csv and read into R, I obtain this:

A tibble from the table obtained reading the .csv file

As you can see, I have 2 types of variables, Force and Time, for 15 different packs of data. I want to group every data of Force and Time in only 2 columns, and create a third column with the ID of every dataset, something like this:

ID | Force| Time
------------------------------------
SR_1.5x1.5x1.5_90s 001 | 52.2 | 0.00
SR_1.5x1.5x1.5_90s 001 | 55.3 | 0.04
... | ... | ...
SR_1.5x1.5x1.5_90s 002 | 64.8 | 0.00
SR_1.5x1.5x1.5_90s 002 | 69.6 | 0.04


I tried using tidyr function 'gather()' but the results were not satisfatory.

It's the first question I make here, so my apologies if any information is missing.

Answer

You want to first gather all your columns and then spread with respect to alternating Force and Time rows.

First, we recreate a small snippet of your data as:

rtabla1 <- structure(list(SR_1.5x1.5x1.5_90s.001.Force = c(52.2, 55.3, 62.6, 66.5, 70.8, 75.9, 77.6, 78.7, 80.2, 83.8), SR_1.5x1.5x1.5_90s.001.Time = c(0, 0.004, 0.008, 0.012, 0.016, 0.02, 0.024, 0.028, 0.032, 0.036), 
SR_1.5x1.5x1.5_90s.002.Force = c(64.8, 69.6, 76, 80.2, 85.1, 
90.5, 94.1, 95.6, 99.7, 103.4), SR_1.5x1.5x1.5_90s.002.Time = c(0, 
0.004, 0.008, 0.012, 0.016, 0.02, 0.024, 0.028, 0.032, 0.036
)), .Names = c("SR_1.5x1.5x1.5_90s.001.Force", "SR_1.5x1.5x1.5_90s.001.Time", "SR_1.5x1.5x1.5_90s.002.Force", "SR_1.5x1.5x1.5_90s.002.Time"), row.names = c(NA, -10L), class = "data.frame")
##   SR_1.5x1.5x1.5_90s.001.Force SR_1.5x1.5x1.5_90s.001.Time SR_1.5x1.5x1.5_90s.002.Force SR_1.5x1.5x1.5_90s.002.Time
##1                          52.2                       0.000                         64.8                       0.000
##2                          55.3                       0.004                         69.6                       0.004
##3                          62.6                       0.008                         76.0                       0.008
##4                          66.5                       0.012                         80.2                       0.012
##5                          70.8                       0.016                         85.1                       0.016
##6                          75.9                       0.020                         90.5                       0.020
##7                          77.6                       0.024                         94.1                       0.024
##8                          78.7                       0.028                         95.6                       0.028
##9                          80.2                       0.032                         99.7                       0.032
##10                         83.8                       0.036                        103.4                       0.036

Using tidyr and dplyr:

library(dplyr)
library(tidyr)
result <- rtabla1 %>% mutate(Row=seq_len(n())) %>%                             ##1.
                      gather("ID","Vals",-Row) %>%                             ##2.
                      mutate(FT=ifelse(grepl(".Force$",ID), "Force", "Time"),  ##3.
                             ID=sub("(.Force$)|(.Time$)","",ID)) %>%
                      spread(FT,Vals) %>%                                      ##4.
                      select(-Row) %>%                                         ##5.
                      arrange(ID)                                              ##6.

Notes:

  1. First add an identifier column that is just the row numbers.
  2. gather all the columns except Row resulting in an ID column that contains the column names that are gathered.
  3. mutate this ID column using string manipulation to form the column FT. This FT column is created to identify which of the gathered rows are a Force and which are Time. It will be used in the subsequent spread. and its values are created to match the column names of your result (i.e., "Force" or "Time"). Here, use grepl to check if ID ends with ".Force". If it does, then set the FT column to "Force"; otherwise, set the FT column to "Time". Then, update the ID column to remove either ".Force" or ".Time" from its end.
  4. Now, spread using the FT as the key and the gathered values as the value.
  5. Remove the Row column from the result.
  6. Sort by ID

The result using the small snippet of data is as expected:

print(result)
##                       ID Force  Time
##1  SR_1.5x1.5x1.5_90s.001  52.2 0.000
##2  SR_1.5x1.5x1.5_90s.001  55.3 0.004
##3  SR_1.5x1.5x1.5_90s.001  62.6 0.008
##4  SR_1.5x1.5x1.5_90s.001  66.5 0.012
##5  SR_1.5x1.5x1.5_90s.001  70.8 0.016
##6  SR_1.5x1.5x1.5_90s.001  75.9 0.020
##7  SR_1.5x1.5x1.5_90s.001  77.6 0.024
##8  SR_1.5x1.5x1.5_90s.001  78.7 0.028
##9  SR_1.5x1.5x1.5_90s.001  80.2 0.032
##10 SR_1.5x1.5x1.5_90s.001  83.8 0.036
##11 SR_1.5x1.5x1.5_90s.002  64.8 0.000
##12 SR_1.5x1.5x1.5_90s.002  69.6 0.004
##13 SR_1.5x1.5x1.5_90s.002  76.0 0.008
##14 SR_1.5x1.5x1.5_90s.002  80.2 0.012
##15 SR_1.5x1.5x1.5_90s.002  85.1 0.016
##16 SR_1.5x1.5x1.5_90s.002  90.5 0.020
##17 SR_1.5x1.5x1.5_90s.002  94.1 0.024
##18 SR_1.5x1.5x1.5_90s.002  95.6 0.028
##19 SR_1.5x1.5x1.5_90s.002  99.7 0.032
##20 SR_1.5x1.5x1.5_90s.002 103.4 0.036