Nishant - 1 year ago 104
R Question

# Dynamically selecting principal components from the PCA output

This seems a trivial problem but i am unable to get the issue resolved!

I have taken numeric columns of iris data set ..then normalized it as below

``````newiris<-iris[,1:4]
iris.norm<-data.frame(scale(newiris))
Sepal.Length Sepal.Width Petal.Length Petal.Width
1   -0.8976739  1.01560199    -1.335752   -1.311052
2   -1.1392005 -0.13153881    -1.335752   -1.311052
3   -1.3807271  0.32731751    -1.392399   -1.311052
4   -1.5014904  0.09788935    -1.279104   -1.311052
5   -1.0184372  1.24503015    -1.335752   -1.311052
6   -0.5353840  1.93331463    -1.165809   -1.048667

# performed PCA now
pccomp <- prcomp(iris.norm )
summary(pccomp)
a <- summary(pccomp)
df<- as.data.frame(a\$importance)
df <- t(df)
df
##     Standard deviation Proportion of Variance Cumulative Proportion
## PC1          1.7083611                0.72962               0.72962
## PC2          0.9560494                0.22851               0.95813
## PC3          0.3830886                0.03669               0.99482
## PC4          0.1439265                0.00518               1.00000
``````

Now converting rownames into a column for df so that PCs which were rownames forms the first column for further manipulation

``````   library(tibble)
library(dplyr)
df
##   PrinComp Standard deviation Proportion of Variance Cumulative Proportion
## 1      PC1          1.7083611                0.72962               0.72962
## 2      PC2          0.9560494                0.22851               0.95813
## 3      PC3          0.3830886                0.03669               0.99482
## 4      PC4          0.1439265                0.00518               1.00000

# Now will be selecting only those PCs where the cumulative proportion is say less than 96%
# subsetting
pcs<-as.vector(as.character(df[which(df\$`Cumulative Proportion`<0.96),][,1])) # cumulative prop less than 96%
pcs
## [1] "PC1" "PC2"
``````

Now i am creating a PC data frame statically of vector scores from the first 2 principal components which we got from the above condition (cum prop<0.96)

`````` x1 <- pccomp\$x[,1]
x2 <- pccomp\$x[,2]
pcdf <- cbind(x1,x2)
##             x1         x2
## [1,] -2.257141 -0.4784238
## [2,] -2.074013  0.6718827
## [3,] -2.356335  0.3407664
## [4,] -2.291707  0.5953999
## [5,] -2.381863 -0.6446757
## [6,] -2.068701 -1.4842053
``````

My issue is how can i create the above pc data frame dynamically once i know the no of PCs based on condition such as cumulative proportion say being less than 0.95??

You can just run a while loop on the `df's cumulative proportion` field and append the transformed value till it's less than the required threshold.

``````threshold = 0.96
pcdf = list()
i    = 1
while(df\$`Cumulative Proportion`[i]<threshold){
pcdf[[i]] = pccomp\$x[,i]
i = i +1
}
pcdf = as.data.frame(pcdf)

names(pcdf) = paste("x",c(1:ncol(pcdf)),sep="")
``````

The output

``````> head(pcdf)
x1         x2
1 -2.257141 -0.4784238
2 -2.074013  0.6718827
3 -2.356335  0.3407664
4 -2.291707  0.5953999
5 -2.381863 -0.6446757
6 -2.068701 -1.4842053
``````

when the `threshold = 0.999` running the same code gives

``````> head(pcdf)
x1         x2          x3
1 -2.257141 -0.4784238  0.12727962
2 -2.074013  0.6718827  0.23382552
3 -2.356335  0.3407664 -0.04405390
4 -2.291707  0.5953999 -0.09098530
5 -2.381863 -0.6446757 -0.01568565
6 -2.068701 -1.4842053 -0.02687825
``````

UPDATE

Assuming you know the number of principle component you want say `i`.you can use

``````a <- sapply(X = c(1:i),FUN = function(X){pcdf[[X]] = pccomp\$x[,X]})
``````

instead of the whole `while loop section`. so for i = 2 you get

``````> head(a)
[,1]       [,2]
[1,] -2.257141 -0.4784238
[2,] -2.074013  0.6718827
[3,] -2.356335  0.3407664
[4,] -2.291707  0.5953999
[5,] -2.381863 -0.6446757
[6,] -2.068701 -1.4842053
``````