Shaxi Liver Shaxi Liver - 2 years ago 134
R Question

PCA analysis wrong output

That's the example data:

structure(c(368113, 87747.35, 508620.5, 370570.5, 87286.5, 612728,
55029, 358521, 2802880, 2045399.5, 177099, 317974.5, 320687.95,
6971292.55, 78949, 245415.95, 50148.5, 67992.5, 97634, 56139.5,
371719.2, 80182.7, 612078.5, 367822.5, 80691, 665190.65, 28283.5,
309720, 2853241.5, 1584324, 135482.5, 270959, 343879.1, 6748208.5,
71534.9, 258976, 28911.75, 78306, 56358.7, 46783.5, 320882.85,
53098.3, 537383.5, 404505.5, 89759.7, 624120.55, 40406, 258183.5,
3144610.45, 1735583.5, 122013.5, 249741, 362585.35, 5383869.15,
23172.2, 223704.45, 40543.7, 68522.5, 43187.05, 29745, 356058.5,
89287.25, 492242.5, 452135.5, 97253.55, 575661.95, 65739.5, 334703.5,
3136065, 1622936.5, 131381.5, 254362, 311496.3, 5627561, 68210.6,
264610.1, 45851, 65010.5, 32665.5, 39957.5, 362476.75, 59451.65,
548279, 345096.5, 93363.5, 596444.2, 11052.5, 252812, 2934035,
1732707.55, 208409.5, 208076.5, 437764.25, 16195882.45, 77461.25,
205803.85, 30437.5, 75540, 49576.75, 48878, 340380.5, 43785.35,
482713, 340315, 64308.5, 517859.85, 11297, 268993.5, 3069028.5,
1571889, 157561, 217596.5, 400610.65, 5703337.6, 50640.65, 197477.75,
40070, 66619, 81564.55, 41436.5, 367592.3, 64954.9, 530093, 432025,
87212.5, 553901.65, 20803.5, 333940.5, 3027254.5, 1494468, 195221,
222895.5, 494429.45, 7706885.75, 60633.35, 192827.1, 29857.5,
81001.5, 112588.65, 68904.5, 338822.5, 56868.15, 467350, 314526.5,
105568, 749456.1, 19597.5, 298939.5, 2993199.2, 1615231.5, 229185.5,
280433.5, 360156.15, 5254889.1, 79369.5, 175434.05, 40907.05,
70919, 65720.15, 53054.5), .Dim = c(20L, 8L), .Dimnames = list(
c("Anne", "Greg", "thomas", "Chris", "Gerard", "Monk", "Mart",
"Mutr", "Aeqe", "Tor", "Gaer", "Toaq", "Kolr", "Wera", "Home",
"Terlo", "Kulte", "Mercia", "Loki", "Herta"), c("Day_Rep1",
"Day_Rep2", "Day_Rep3", "Day_Rep4", "Day2_Rep1", "Day2_Rep2",
"Day2_Rep3", "Day2_Rep4")))


I would like to perform a nice PCA analysis. I expect that replicates from Day will be nicely correlated with each other and replicates from Day2 together. I was trying to perform some analysis using the code below:

## log transform
data_log <- log(data[, 1:8])
#vec_EOD_EON
dt_PCA <- prcomp(data_log,
center = TRUE,
scale. = TRUE)

library(devtools)
install_github("ggbiplot", "vqv")

library(ggbiplot)
g <- ggbiplot(dt_PCA, obs.scale = 1, var.scale = 1,
groups = colnames(dt_PCA), ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = "")
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)


However, the output is not what I am looking for:

Output

but I am looking for something more like that:

Desired output

I would like to use dots for each row in the data and different colors for each of the replicate. Would be cool to use the similar colors for Day replicates and as well for Day2.

Obtained data with ggplot:

enter image description here

Answer Source

Let's imagine you save your data into df.

library(ggplot2)
pc_df <- prcomp(t(df), scale.=TRUE)
pc_table <- as.data.frame(pc_df$x[,1:2]) # extracting 1st and 2nd component

experiment_regex <- '(^[^_]+)_Rep(\\d+)' # extracting replicate and condition from your experiment names
pc_table$replicate <- as.factor(sub(experiment_regex,'\\2', rownames(pc_table)))
pc_table$condition <- as.factor(sub(experiment_regex,'\\1', rownames(pc_table)))

ggplot(pc_table, aes(PC1, PC2, color=condition, shape=replicate)) +
  geom_point() +
  xlab(sprintf('PC1 - %.1f%%', # extracting the percentage of each PC and print it on the axes
               summary(pc_df)$importance[2,1] * 100)) +
  ylab(sprintf('PC2 - %.1f%%', 
               summary(pc_df)$importance[2,2] * 100))

The first thing you have to do, to get your data in the correct shape is to transform it using t(). This might be already what you are looking for.

I prefer to do the plots with my own function and I wrote the steps down to get a nice plot with ggplot2.

example figure

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download