Jakub Bochenski - 1 year ago 143
R Question

# R - how to make PCA biplot more readable

I have a set of observations with 23 variables.

When I use prcomp and biplot to plot the results I run into several problems:

1. the actual plot only occupies half of the frame (x < 0), but the plot is centered on 0, so half of space is wasted

2. two variables clearily dominate the results, so all other arrows are clumped together and I can't read a thing

ad 1. I tried setting xlim and/or ylim, but I'm obviously doing something wrong since the plot is all messed up when I do

ad 2. Can I just somehow make the arrow labels placed more apart so that I can read them? Or maybe I could just plot the arrows without the two longest ones (kind of zoom-in)?

Addendum: is it possible to have biplot draw the labels in a different color than the arrows?

Also: is it problematic if the x and y axes are not proportional (they graph shows intervals of different length on x and y).
I think this would skew the angels between arrows, and that kind of resizing is not a similarity transformation.
Is it possible to force biplot to keep a 1:1 aspect ratio, or to draw the plot as a rectangle and not a square?

I think you can use `xlim` and `ylim`. Also, have a look at the `expand` argument for `?biplot`. Unfortunately, you did not provide any data, so let's take some sample data:

``````a <- princomp(USArrests)
``````

Below the result of just calling `biplot`:

``````biplot(a)
``````

And now one can "zoom in" to have a closer look at "Murder" and "Rape" using `xlim` and `ylim` and also use the scaling argument `expand` from `?biplot`:

``````biplot(a, expand=10, xlim=c(-0.30, 0.0), ylim=c(-0.1, 0.1))
``````

Please note the different scaling on the top and right axis due to the `expand` factor.

EDIT

You also asked whether it is possible to have different colors for labels and arrows. `biplot` does not support this, what you could do is to copy the code of `stats:::biplot.default` and then change it according to your needs (change `col` argument when `plot`, `axis` and `text` is used).

Alternatively, you could use `ggplot` for the biplot. In the post here, a simple biplot function is implemented. You could change the code as follows:

``````PCbiplot <- function(PC, x="PC1", y="PC2", colors=c('black', 'black', 'red', 'red')) {
# PC being a prcomp object
data <- data.frame(obsnames=row.names(PC\$x), PC\$x)
plot <- ggplot(data, aes_string(x=x, y=y)) + geom_text(alpha=.4, size=3, aes(label=obsnames), color=colors[1])
plot <- plot + geom_hline(aes(0), size=.2) + geom_vline(aes(0), size=.2, color=colors[2])
datapc <- data.frame(varnames=rownames(PC\$rotation), PC\$rotation)
mult <- min(
(max(data[,y]) - min(data[,y])/(max(datapc[,y])-min(datapc[,y]))),
(max(data[,x]) - min(data[,x])/(max(datapc[,x])-min(datapc[,x])))
)
datapc <- transform(datapc,
v1 = .7 * mult * (get(x)),
v2 = .7 * mult * (get(y))
)
plot <- plot + coord_equal() + geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 5, vjust=1, color=colors[3])
plot <- plot + geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), arrow=arrow(length=unit(0.2,"cm")), alpha=0.75, color=colors[4])
plot
}
``````

Plot as follows:

``````fit <- prcomp(USArrests, scale=T)
PCbiplot(fit, colors=c("black", "black", "red", "yellow"))
``````

If you play around a bit with this function, I am sure you can figure out how to set `xlim` and `ylim` values, etc.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download