Jakub Bochenski Jakub Bochenski - 1 year ago 143
R Question

R - how to make PCA biplot more readable

I have a set of observations with 23 variables.

When I use prcomp and biplot to plot the results I run into several problems:

  1. the actual plot only occupies half of the frame (x < 0), but the plot is centered on 0, so half of space is wasted

  2. two variables clearily dominate the results, so all other arrows are clumped together and I can't read a thing

ad 1. I tried setting xlim and/or ylim, but I'm obviously doing something wrong since the plot is all messed up when I do

ad 2. Can I just somehow make the arrow labels placed more apart so that I can read them? Or maybe I could just plot the arrows without the two longest ones (kind of zoom-in)?

My PCA plot

Addendum: is it possible to have biplot draw the labels in a different color than the arrows?

Also: is it problematic if the x and y axes are not proportional (they graph shows intervals of different length on x and y).
I think this would skew the angels between arrows, and that kind of resizing is not a similarity transformation.
Is it possible to force biplot to keep a 1:1 aspect ratio, or to draw the plot as a rectangle and not a square?

Answer Source

I think you can use xlim and ylim. Also, have a look at the expand argument for ?biplot. Unfortunately, you did not provide any data, so let's take some sample data:

a <- princomp(USArrests)

Below the result of just calling biplot:


enter image description here

And now one can "zoom in" to have a closer look at "Murder" and "Rape" using xlim and ylim and also use the scaling argument expand from ?biplot:

biplot(a, expand=10, xlim=c(-0.30, 0.0), ylim=c(-0.1, 0.1))

enter image description here

Please note the different scaling on the top and right axis due to the expand factor.

Does this help to make your plot mare readable?


You also asked whether it is possible to have different colors for labels and arrows. biplot does not support this, what you could do is to copy the code of stats:::biplot.default and then change it according to your needs (change col argument when plot, axis and text is used).

Alternatively, you could use ggplot for the biplot. In the post here, a simple biplot function is implemented. You could change the code as follows:

PCbiplot <- function(PC, x="PC1", y="PC2", colors=c('black', 'black', 'red', 'red')) {
    # PC being a prcomp object
    data <- data.frame(obsnames=row.names(PC$x), PC$x)
    plot <- ggplot(data, aes_string(x=x, y=y)) + geom_text(alpha=.4, size=3, aes(label=obsnames), color=colors[1])
    plot <- plot + geom_hline(aes(0), size=.2) + geom_vline(aes(0), size=.2, color=colors[2])
    datapc <- data.frame(varnames=rownames(PC$rotation), PC$rotation)
    mult <- min(
        (max(data[,y]) - min(data[,y])/(max(datapc[,y])-min(datapc[,y]))),
        (max(data[,x]) - min(data[,x])/(max(datapc[,x])-min(datapc[,x])))
    datapc <- transform(datapc,
            v1 = .7 * mult * (get(x)),
            v2 = .7 * mult * (get(y))
    plot <- plot + coord_equal() + geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 5, vjust=1, color=colors[3])
    plot <- plot + geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), arrow=arrow(length=unit(0.2,"cm")), alpha=0.75, color=colors[4])

Plot as follows:

fit <- prcomp(USArrests, scale=T)
PCbiplot(fit, colors=c("black", "black", "red", "yellow"))

enter image description here

If you play around a bit with this function, I am sure you can figure out how to set xlim and ylim values, etc.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download