Erkin Acar Erkin Acar - 3 months ago 19
R Question

violin plot ggplot2 with width from column

I am pretty new to R and I only use it for visualization so I may be missing something simple.

Simply what I want is, I have two columns which should be x and y axes. Third column I have should define the width of the graph. I didn't come far with the code even though I tried many things from different answers. Let's say that I am this far in code:

ggplot(disM, aes(x=study, y=value)) +
geom_violin() +
labs(list(title="Distribution", x="Studies", y="Ranges"))


which is not really achieving anything.

I have a table like this:

Col0 study value
1 30-31 breast cancer 357263
2 32-33 breast cancer 352067
3 34-35 breast cancer 340264
4 36-37 breast cancer 309827
5 38-39 breast cancer 298684
6 40-41 breast cancer 322570
7 42-43 breast cancer 338480
8 44-45 breast cancer 354451
9 46-47 breast cancer 429183
10 48-49 breast cancer 396942
11 50-51 breast cancer 415195
12 52-53 breast cancer 368217
13 54-55 breast cancer 445884
14 56-57 breast cancer 395652
15 58-59 breast cancer 386643
16 60-61 breast cancer 461940
17 62-63 breast cancer 473772
18 64-65 breast cancer 464228
19 66-67 breast cancer 485851
20 68-69 breast cancer 513411
21 70-71 breast cancer 576618
22 72-73 breast cancer 588724
23 74-75 breast cancer 634343
24 76-77 breast cancer 584662
25 78-79 breast cancer 608901
26 80-81 breast cancer 617286
27 82-83 breast cancer 659318
28 84-85 breast cancer 757167
29 86-87 breast cancer 1044465
30 88-89 breast cancer 982901
31 90-91 breast cancer 1114269
32 92-93 breast cancer 1110257
33 94-95 breast cancer 1742966
34 96-97 breast cancer 6379974
35 98-99 breast cancer 3437746
36 100-101 breast cancer 118984063
37 30-31 renal cancer 1055566
38 32-33 renal cancer 1089405
39 34-35 renal cancer 1228087
40 36-37 renal cancer 1265606
41 38-39 renal cancer 1264919
42 40-41 renal cancer 1248949
43 42-43 renal cancer 1391738
44 44-45 renal cancer 1453100
45 46-47 renal cancer 1443915
46 48-49 renal cancer 1429785
47 50-51 renal cancer 1372041
48 52-53 renal cancer 1339706
49 54-55 renal cancer 1418135
50 56-57 renal cancer 1484162
51 58-59 renal cancer 1582617
52 60-61 renal cancer 1571977
53 62-63 renal cancer 1652503
54 64-65 renal cancer 1742230
55 66-67 renal cancer 1859936
56 68-69 renal cancer 1928028
57 70-71 renal cancer 2041783
58 72-73 renal cancer 2108994
59 74-75 renal cancer 2154244
60 76-77 renal cancer 2218430
61 78-79 renal cancer 2333206
62 80-81 renal cancer 2377262
63 82-83 renal cancer 2345651
64 84-85 renal cancer 2402114
65 86-87 renal cancer 2519284
66 88-89 renal cancer 2542761
67 90-91 renal cancer 2587606
68 92-93 renal cancer 2308279
69 94-95 renal cancer 2980927
70 96-97 renal cancer 14108950
71 98-99 renal cancer 2762116
72 100-101 renal cancer 211513230


X axis should be study column, y should be
Col0
and width of the violin plot should be value column. I cannot split col0 as I only have the data as a range.

Any pointer for what to check, how to do this will be appreciated. Sorry if I missed a similar question.

Thanks in advance

Answer

I'm going to take a guess. (If I'm right, you could also look for information about pyramid plots.)

Reorder labels so that "100-101" really comes at the end:

disM$Col0 <- factor(disM$Col0,levels=unique(disM$Col0))

Rearrange to make it easier to draw polygons (I wish there was an easier way to do this, but can't think of one):

library(plyr)
disM2 <- ddply(disM,"study",
   function(dd) with(dd,
             data.frame(y=c(as.numeric(Col0),rev(as.numeric(Col0))),
                        x=c(-value/2,rev(value/2)))))


library(ggplot2); theme_set(theme_bw())
ggplot(disM2)+
    geom_polygon(aes(x,y),alpha=0.5)+
    facet_wrap(~study)+
    labs(list(title="Distribution"))+
    scale_y_continuous(breaks=as.numeric(disM$Col0),
                       labels=disM$Col0)+
    scale_x_continuous(labels=NULL)

enter image description here