jeffgoblue jeffgoblue - 24 days ago 12
R Question

In R, how do I create a transformed subset of a data frame?

first question here! I'm working in R 3.3.1 (64-bit) on Windows 10.

I have data stored in a data frame called lwd. The data are grouped by a factor called 'wafer', and on each wafer, there are 10 locations (called 'point') where 4 different parameters were measured (v1, v2, v3, v4) (So visualize 5 silicon wafers, 10 locations per wafer, with four different measurements at each location. A total of 50 rows).

A sample of how the data looks in R (first 20 rows)

> lwd
data wafer point v1 v2 v3 v4
1 1 T3 1 0.3450 -1.3423 51.21 15.853
2 2 T3 2 0.3473 -1.5756 45.44 15.667
3 3 T3 3 0.3441 -1.3486 39.57 15.894
4 4 T3 4 0.3478 -1.7150 44.67 15.600
5 5 T3 5 0.3482 -1.4154 42.02 15.683
6 6 T3 6 0.3478 -1.4477 38.66 15.693
7 7 T3 7 0.3430 -1.3210 41.96 15.955
8 8 T3 8 0.3458 -1.6119 43.41 15.721
9 9 T3 9 0.3451 -1.4688 35.19 15.802
10 10 T3 10 0.3446 -1.4078 45.82 15.850
11 11 T1 1 0.3412 -3.2319 37.51 15.381
12 12 T1 2 0.3450 -3.2202 41.69 15.233
13 13 T1 3 0.3415 -3.1850 32.21 15.383
14 14 T1 4 0.3442 -3.2748 40.77 15.248
15 15 T1 5 0.3470 -3.3064 35.06 15.126
16 16 T1 6 0.3453 -3.3552 31.67 15.178
17 17 T1 7 0.3416 -3.4090 35.29 15.310
18 18 T1 8 0.3462 -3.2323 38.30 15.179
19 19 T1 9 0.3428 -3.4104 29.13 15.262
20 20 T1 10 0.3452 -3.5293 40.57 15.129
...
50 50 W2 10 0.3475 -2.8963 42.07 15.231


For each of v1 through v4, I want to create a transformed subset that looks like this (example):

>v1.group
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
T1 0.3412 0.3450 0.3415 0.3442 0.3470 0.3453 0.3416 0.3462 0.3428 0.3452
T3 0.3450 0.3473 0.3441 0.3478 0.3482 0.3478 0.3430 0.3458 0.3451 0.3446
W1 0.3521 0.3540 0.3555 0.3537 0.3550 0.3551 0.3514 0.3536 0.3547 0.3531
W2 0.3483 0.3503 0.3469 0.3477 0.3518 0.3511 0.3447 0.3485 0.3477 0.3475
W3 0.3430 0.3447 0.3462 0.3444 0.3468 0.3460 0.3425 0.3444 0.3430 0.3437


where each row corresponds to wafer, and each column is the measurement location ('point') 1 through 10. I'm happy to work on v1-v4 one-at-a-time, but I imagine there's a way to spit out v1.group, v2.group...etc. with one command. I've seen this done before a long time ago and without added libraries, but I've been unable to track it down.

Hopefully I've done this right: here's some code for you to reproduce the first 20 rows of my data set.

structure(list(data = 1:20, wafer = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("T1", "T3", "W1", "W2", "W3"), class = "factor"),
point = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), v1 = c(0.345, 0.3473,
0.3441, 0.3478, 0.3482, 0.3478, 0.343, 0.3458, 0.3451, 0.3446,
0.3412, 0.345, 0.3415, 0.3442, 0.347, 0.3453, 0.3416, 0.3462,
0.3428, 0.3452), v2 = c(-1.3423, -1.5756, -1.3486, -1.715,
-1.4154, -1.4477, -1.321, -1.6119, -1.4688, -1.4078, -3.2319,
-3.2202, -3.185, -3.2748, -3.3064, -3.3552, -3.409, -3.2323,
-3.4104, -3.5293), v3 = c(51.21, 45.44, 39.57, 44.67, 42.02,
38.66, 41.96, 43.41, 35.19, 45.82, 37.51, 41.69, 32.21, 40.77,
35.06, 31.67, 35.29, 38.3, 29.13, 40.57), v4 = c(15.853,
15.667, 15.894, 15.6, 15.683, 15.693, 15.955, 15.721, 15.802,
15.85, 15.381, 15.233, 15.383, 15.248, 15.126, 15.178, 15.31,
15.179, 15.262, 15.129)), .Names = c("data", "wafer", "point",
"v1", "v2", "v3", "v4"), row.names = c(NA, 20L), class = "data.frame")


Thanks. I look forward to your help and being a part of the community.

Answer

We can do this in a loop and use dcast from data.table (or if we need a matrix, then we can change the dcast to acast (from reshape2)

library(data.table)

lapply(grep('v\\d+', names(lwd)), function(i) dcast(setnames(setDT(lwd[c(1:3, i)]), 
               4, 'v'), wafer~point, value.var = "v"))

Or another option is xtabs from base R

lapply(grep('v\\d+', names(lwd)), function(i) 
        xtabs(v~wafer+point, transform(lwd[c(2:3)], v = lwd[,i])))
Comments