rmflight rmflight - 1 month ago 6
R Question

Multiple non-grouped items in R tabular() output

I have a data.frame that I want to use

tables::tabular()
to setup for nice printing in latex. It has 5 repeated items in two groups (
normal
and
compress
), where I want three items to not be grouped, and then the rest to be grouped.

test_table <- structure(list(id = structure(c(2L, 3L, 5L, 1L, 4L, 2L, 3L, 5L,
1L, 4L), .Label = c("GO:0005525", "GO:0005634", "GO:0008270",
"GO:0019001", "GO:0046914"), class = "factor"), description = c("nucleus",
"zinc ion binding", "transition metal ion binding", "GTP binding",
"guanyl nucleotide binding", "nucleus", "zinc ion binding", "transition metal ion binding",
"GTP binding", "guanyl nucleotide binding"), IPR.group = c("H",
"W", "W", "AE", "AE", "H", "W", "W", "AE", "AE"), consistent = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), p = c(4.92245771293119e-05,
1.08157386873641e-21, 2.06049782601929e-14, 0.999999999562468,
0.999999999985399, 1, 1, 0.999999999999996, 6.51428091733489e-09,
2.3200965815753e-10), padjust = c(0.0166308749872604, 8.52640733187206e-19,
1.2182693396339e-11, 1, 1, 1, 1, 1, 9.06251433499824e-07, 3.91930601101827e-08
), metal = c("zn", "zn", "zn", "mg", "mg", "ca", "ca", "ca",
"ca", "ca"), perc = c(0.841726618705036, 0.831807780320366, 0.519281914893617,
0.875598086124402, 0.876651982378855, 0, 0, 0, 0, 0), sig = c("TRUE",
"TRUE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE",
"TRUE", "TRUE"), which = structure(c(2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L), .Label = c("compress", "normal"), class = "factor")), .Names = c("id",
"description", "IPR.group", "consistent", "p", "padjust", "metal",
"perc", "sig", "which"), row.names = c(NA, -10L), class = "data.frame")

test_table
id description IPR.group consistent p padjust metal perc sig which
1 GO:0005634 nucleus H TRUE 4.922458e-05 1.663087e-02 zn 0.8417266 TRUE normal
2 GO:0008270 zinc ion binding W TRUE 1.081574e-21 8.526407e-19 zn 0.8318078 TRUE normal
3 GO:0046914 transition metal ion binding W TRUE 2.060498e-14 1.218269e-11 zn 0.5192819 TRUE normal
4 GO:0005525 GTP binding AE TRUE 1.000000e+00 1.000000e+00 mg 0.8755981 FALSE normal
5 GO:0019001 guanyl nucleotide binding AE TRUE 1.000000e+00 1.000000e+00 mg 0.8766520 FALSE normal
6 GO:0005634 nucleus H TRUE 1.000000e+00 1.000000e+00 ca 0.0000000 FALSE compress
7 GO:0008270 zinc ion binding W TRUE 1.000000e+00 1.000000e+00 ca 0.0000000 FALSE compress
8 GO:0046914 transition metal ion binding W TRUE 1.000000e+00 1.000000e+00 ca 0.0000000 FALSE compress
9 GO:0005525 GTP binding AE TRUE 6.514281e-09 9.062514e-07 ca 0.0000000 TRUE compress
10 GO:0019001 guanyl nucleotide binding AE TRUE 2.320097e-10 3.919306e-08 ca 0.0000000 TRUE compress


So, I can start to get close if I do:

library(tables)
tabular(id ~ which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)




which
compress normal
id p padjust metal perc sig p padjust metal perc sig
GO:0005525 6.514e-09 9.063e-07 ca 0 TRUE 1.000e+00 1.000e+00 mg 0.8756 FALSE
GO:0005634 1.000e+00 1.000e+00 ca 0 FALSE 4.922e-05 1.663e-02 zn 0.8417 TRUE
GO:0008270 1.000e+00 1.000e+00 ca 0 FALSE 1.082e-21 8.526e-19 zn 0.8318 TRUE
GO:0019001 2.320e-10 3.919e-08 ca 0 TRUE 1.000e+00 1.000e+00 mg 0.8767 FALSE
GO:0046914 1.000e+00 1.000e+00 ca 0 FALSE 2.060e-14 1.218e-11 zn 0.5193 TRUE


But, as soon as I try to add the
description
column anywhere I think
it should be, I start to get errors:

tabular((id + description) ~ which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)
# Error in term2table(rows[[i]], cols[[j]], data, n) : Duplicate values: description and p

tabular((id + IPR.group) ~ which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)
# Error in term2table(rows[[i]], cols[[j]], data, n) : Duplicate values: IPR.group and p


Even putting
description
in the independent side returns something really funny where the character gets turned into a numeric:

tabular(id ~ description + which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)
which
compress normal
id description p padjust metal perc sig p padjust metal perc sig
GO:0005525 2 6.514e-09 9.063e-07 ca 0 TRUE 1.000e+00 1.000e+00 mg 0.8756 FALSE
GO:0005634 2 1.000e+00 1.000e+00 ca 0 FALSE 4.922e-05 1.663e-02 zn 0.8417 TRUE
GO:0008270 2 1.000e+00 1.000e+00 ca 0 FALSE 1.082e-21 8.526e-19 zn 0.8318 TRUE
GO:0019001 2 2.320e-10 3.919e-08 ca 0 TRUE 1.000e+00 1.000e+00 mg 0.8767 FALSE
GO:0046914 2 1.000e+00 1.000e+00 ca 0 FALSE 2.060e-14 1.218e-11 zn 0.5193 TRUE


I can fudge it if I make a new column that is the concatenation of the
columns I want displayed, but I'd have to write something to make them all look consistent:

test_table$ID <- paste0(test_table$id, " ", test_table$description, " ", test_table$IPR.group)
test_table$ID <- factor(test_table$ID)
tabular(ID ~ which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)


which
compress normal
ID p padjust metal perc sig p padjust metal perc sig
GO:0005525 GTP binding AE 6.514e-09 9.063e-07 ca 0 TRUE 1.000e+00 1.000e+00 mg 0.8756 FALSE
GO:0005634 nucleus H 1.000e+00 1.000e+00 ca 0 FALSE 4.922e-05 1.663e-02 zn 0.8417 TRUE
GO:0008270 zinc ion binding W 1.000e+00 1.000e+00 ca 0 FALSE 1.082e-21 8.526e-19 zn 0.8318 TRUE
GO:0019001 guanyl nucleotide binding AE 2.320e-10 3.919e-08 ca 0 TRUE 1.000e+00 1.000e+00 mg 0.8767 FALSE
GO:0046914 transition metal ion binding W 1.000e+00 1.000e+00 ca 0 FALSE 2.060e-14 1.218e-11 zn 0.5193 TRUE


I thought I should be able to do it using 1 of the other solutions above, but not so much. Any help would be appreciated. Also, any solutions should also remove the
which
that is shown above
compress
and
normal
in the header of the table.

Answer

This seems close, at least:

> tabular(id ~ Heading()*which*(description + p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)

            compress                                                         
 id         description                  p         padjust   metal perc sig  
 GO:0005525 GTP binding                  6.514e-09 9.063e-07 ca    0    TRUE 
 GO:0005634 nucleus                      1.000e+00 1.000e+00 ca    0    FALSE
 GO:0008270 zinc ion binding             1.000e+00 1.000e+00 ca    0    FALSE
 GO:0019001 guanyl nucleotide binding    2.320e-10 3.919e-08 ca    0    TRUE 
 GO:0046914 transition metal ion binding 1.000e+00 1.000e+00 ca    0    FALSE

 normal                                                             
 description                  p         padjust   metal perc   sig  
 GTP binding                  1.000e+00 1.000e+00 mg    0.8756 FALSE
 nucleus                      4.922e-05 1.663e-02 zn    0.8417 TRUE 
 zinc ion binding             1.082e-21 8.526e-19 zn    0.8318 TRUE 
 guanyl nucleotide binding    1.000e+00 1.000e+00 mg    0.8767 FALSE
 transition metal ion binding 2.060e-14 1.218e-11 zn    0.5193 TRUE 

...but you may not be happy with the duplication of the description column in each which group. There might be a way to fix that by pulling the description term outside of the parens, but it looks like that will require some other magical incantation as the naive change complains with an error about duplicate values in combination with p it seems.

Edit: So close to the magic incantation...

tabular(id ~ (description*Heading()*min)+Heading()*which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)

This looks right (maybe?). Issue appears to be tabular really want to apply a summary function to description. unique() would probably be a better choice of "dummy" summary function than min() in this case I suppose, and seems to give the same result.

Edit: Latest refinement...

> tabular(id ~ (description*Heading()*unique)+Heading()*which*(p + padjust + metal + perc + sig)*Heading()*identity, data = test_table)

                                         compress                            
 id         description                  p         padjust   metal perc sig  
 GO:0005525 GTP binding                  6.514e-09 9.063e-07 ca    0    TRUE 
 GO:0005634 nucleus                      1.000e+00 1.000e+00 ca    0    FALSE
 GO:0008270 zinc ion binding             1.000e+00 1.000e+00 ca    0    FALSE
 GO:0019001 guanyl nucleotide binding    2.320e-10 3.919e-08 ca    0    TRUE 
 GO:0046914 transition metal ion binding 1.000e+00 1.000e+00 ca    0    FALSE

 normal                                
 p         padjust   metal perc   sig  
 1.000e+00 1.000e+00 mg    0.8756 FALSE
 4.922e-05 1.663e-02 zn    0.8417 TRUE 
 1.082e-21 8.526e-19 zn    0.8318 TRUE 
 1.000e+00 1.000e+00 mg    0.8767 FALSE
 2.060e-14 1.218e-11 zn    0.5193 TRUE