Thirst for Knowledge Thirst for Knowledge - 2 months ago 29
R Question

Dealing with Zero Values in Principal Component Analysis

I've really been struggling to get my PCA working and I think it is because there are zero values in my data set. But I don't know how to resolve the issue.

The first problem is, the zero values are not missing values (they are areas with no employment in a certain sector), so I should probably keep them in there. I feel uncomfortable that they might be excluded because they are zero.

Secondly, even when I try remove all missing data I still get the same error message.

Starting with the following code, I get the following error message:

urban.pca.cov <- princomp(urban.cov, cor-T)
Error in cov.wt(z) : 'x' must contain finite values only


Also, I can do this:

urban.cut<- na.omit(urban.cut)

> sum(is.na(urban.cut))
[1] 0


And then run it again and get the same issue.

urban.pca.cov <- princomp(urban.cov, cor-T)
Error in cov.wt(z) : 'x' must contain finite values only


Is this a missing data issue? I've log transformed all of my variables according to this PCA tutorial. Here is the structure of my data.

> str(urban.cut)
'data.frame': 5490 obs. of 13 variables:
$ median.lt : num 2.45 2.57 2.53 2.6 2.31 ...
$ p.nga.lt : num 0.547 4.587 4.529 4.605 4.564 ...
$ p.mbps2.lt : num 1.66 4.17 4 3.9 4.2 ...
$ density.lt : num 3.24 3.44 3.85 3.21 4.28 ...
$ p_m_s.lt : num 4.54 4.61 4.56 4.61 4.61 ...
$ p_m_l.lt : num 1.87 -Inf 1.44 -Inf -Inf ...
$ p.tert.lt : num 4.59 4.61 4.55 4.61 4.61 ...
$ p.kibs.lt : num 4.25 3.05 3.12 3 3.03 ...
$ p.edu.lt : num 4.14 2.6 2.9 2.67 2.57 ...
$ p.non.white.lt : num 3.06 3.56 3.82 2.94 3.52 ...
$ p.claim.lt : num 0.459 1.287 1.146 1.415 1.237 ...
$ d.connections.lt: num 2.5614 0.6553 5.2573 0.9562 -0.0252 ...
$ SAM.KM.lt2 : num 1.449 1.081 1.071 1.246 0.594 ...


Thank you in advance for your help.

Answer

Sounds to me like R wants finite values. -inf is not finite. it is minus infinity. Perhaps you should be doing log(data + 1) if you really need to log transform your data, and not log a 0