Nishant Nishant - 1 month ago 14
R Question

Dynamically selecting principal components from the PCA output

This seems a trivial problem but i am unable to get the issue resolved!

I have taken numeric columns of iris data set ..then normalized it as below

newiris<-iris[,1:4]
iris.norm<-data.frame(scale(newiris))
head(iris.norm)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 -0.8976739 1.01560199 -1.335752 -1.311052
2 -1.1392005 -0.13153881 -1.335752 -1.311052
3 -1.3807271 0.32731751 -1.392399 -1.311052
4 -1.5014904 0.09788935 -1.279104 -1.311052
5 -1.0184372 1.24503015 -1.335752 -1.311052
6 -0.5353840 1.93331463 -1.165809 -1.048667

# performed PCA now
pccomp <- prcomp(iris.norm )
summary(pccomp)
a <- summary(pccomp)
df<- as.data.frame(a$importance)
df <- t(df)
df
## Standard deviation Proportion of Variance Cumulative Proportion
## PC1 1.7083611 0.72962 0.72962
## PC2 0.9560494 0.22851 0.95813
## PC3 0.3830886 0.03669 0.99482
## PC4 0.1439265 0.00518 1.00000


Now converting rownames into a column for df so that PCs which were rownames forms the first column for further manipulation

library(tibble)
library(dplyr)
df<-rownames_to_column(as.data.frame(df), var="PrinComp") %>% head
df
## PrinComp Standard deviation Proportion of Variance Cumulative Proportion
## 1 PC1 1.7083611 0.72962 0.72962
## 2 PC2 0.9560494 0.22851 0.95813
## 3 PC3 0.3830886 0.03669 0.99482
## 4 PC4 0.1439265 0.00518 1.00000

# Now will be selecting only those PCs where the cumulative proportion is say less than 96%
# subsetting
pcs<-as.vector(as.character(df[which(df$`Cumulative Proportion`<0.96),][,1])) # cumulative prop less than 96%
pcs
## [1] "PC1" "PC2"


Now i am creating a PC data frame statically of vector scores from the first 2 principal components which we got from the above condition (cum prop<0.96)

x1 <- pccomp$x[,1]
x2 <- pccomp$x[,2]
pcdf <- cbind(x1,x2)
head(pcdf)
## x1 x2
## [1,] -2.257141 -0.4784238
## [2,] -2.074013 0.6718827
## [3,] -2.356335 0.3407664
## [4,] -2.291707 0.5953999
## [5,] -2.381863 -0.6446757
## [6,] -2.068701 -1.4842053


My issue is how can i create the above pc data frame dynamically once i know the no of PCs based on condition such as cumulative proportion say being less than 0.95??

Answer

You can just run a while loop on the df's cumulative proportion field and append the transformed value till it's less than the required threshold.

threshold = 0.96
pcdf = list()
i    = 1
while(df$`Cumulative Proportion`[i]<threshold){
    pcdf[[i]] = pccomp$x[,i]
    i = i +1
}
pcdf = as.data.frame(pcdf)

names(pcdf) = paste("x",c(1:ncol(pcdf)),sep="")

The output

> head(pcdf)
         x1         x2
1 -2.257141 -0.4784238
2 -2.074013  0.6718827
3 -2.356335  0.3407664
4 -2.291707  0.5953999
5 -2.381863 -0.6446757
6 -2.068701 -1.4842053

when the threshold = 0.999 running the same code gives

> head(pcdf)
         x1         x2          x3
1 -2.257141 -0.4784238  0.12727962
2 -2.074013  0.6718827  0.23382552
3 -2.356335  0.3407664 -0.04405390
4 -2.291707  0.5953999 -0.09098530
5 -2.381863 -0.6446757 -0.01568565
6 -2.068701 -1.4842053 -0.02687825

UPDATE

Assuming you know the number of principle component you want say i.you can use

a <- sapply(X = c(1:i),FUN = function(X){pcdf[[X]] = pccomp$x[,X]})

instead of the whole while loop section. so for i = 2 you get

> head(a)
          [,1]       [,2]
[1,] -2.257141 -0.4784238
[2,] -2.074013  0.6718827
[3,] -2.356335  0.3407664
[4,] -2.291707  0.5953999
[5,] -2.381863 -0.6446757
[6,] -2.068701 -1.4842053

where a is your result.