Carolina Karoullas - 4 months ago 30

R Question

So, I'm using R to try and do a phylogenetic PCA on a dataset that I have using the phyl.pca function from the phytools package. However, I'm having issues organising my data in a way that the function will accept! And that's not all: I did a bit of experimenting and I know that there are more issues further down the line, which I will get into...

Getting straight to the issue, here's the data frame (with dummy data) that I'm using:

`>all`

Taxa Tibia Feather

1 Microraptor 138 101

2 Microraptor 139 114

3 Microraptor 145 141

4 Anchiornis 160 81

5 Anchiornis 14 NA

6 Archaeopteryx 134 82

7 Archaeopteryx 136 71

8 Archaeopteryx 132 NA

9 Archaeopteryx 14 NA

10 Scansoriopterygidae 120 85

11 Scansoriopterygidae 116 NA

12 Scansoriopterygidae 123 NA

13 Sapeornis 108 NA

14 Sapeornis 112 86

15 Sapeornis 118 NA

16 Sapeornis 103 NA

17 Confuciusornis 96 NA

18 Confuciusornis 107 30

19 Confuciusornis 148 33

20 Confuciusornis 128 61

The taxa are arranged into a tree (called "tree") with Microraptor being the most basal and then progressing in order through to Confuciusornis:

`>summary(tree)`

Phylogenetic tree: tree

Number of tips: 6

Number of nodes: 5

Branch lengths:

mean: 1

variance: 0

distribution summary:

Min. 1st Qu. Median 3rd Qu. Max.

1 1 1 1 1

No root edge.

Tip labels: Confuciusornis

Sapeornis

Scansoriopterygidae

Archaeopteryx

Anchiornis

Microraptor

No node labels.

And the function:

`>phyl.pca(tree, all, method="BM", mode="corr")`

And this is the error that is coming up:

`Error in phyl.pca(tree, all, method = "BM", mode = "corr") :`

number of rows in Y cannot be greater than number of taxa in your tree

Y being the "all" data frame. So I have 6 taxa in my tree (matching the 6 taxa in the data frame) but there are 20 rows in my data frame. So I used this function:

`> all_agg <- aggregate(all[,-1],by=list(all$Taxa),mean,na.rm=TRUE)`

And got this:

`Group.1 Tibia Feather`

1 Anchiornis 153 81

2 Archaeopteryx 136 77

3 Confuciusornis 120 41

4 Microraptor 141 119

5 Sapeornis 110 86

6 Scansoriopterygidae 120 85

It's a bit odd that the order of the taxa has changed... Is this ok?

In any case, I converted it into a matrix:

`> all_agg_matrix <- as.matrix(all_agg)`

> all_agg_matrix

Group.1 Tibia Feather

[1,] "Anchiornis" "153" "81"

[2,] "Archaeopteryx" "136" "77"

[3,] "Confuciusornis" "120" "41"

[4,] "Microraptor" "141" "119"

[5,] "Sapeornis" "110" "86"

[6,] "Scansoriopterygidae" "120" "85"

And then used the phyl.pca function:

`> phyl.pca(tree, all_agg_matrix, method = "BM", mode = "corr")`

[1] "Y has no names. function will assume that the row order of Y matches tree$tip.label"

Error in invC %*% X : requires numeric/complex matrix/vector arguments

So, now the order that the function is considering taxa in is all wrong (but I can fix that relatively easily). The issue is that phyl.pca doesn't seem to believe that my matrix is actually a matrix. Any ideas why?

Answer

I think you may have bigger problems. Most phylogenetic methods, I suspect including `phyl.pca`

, assume that traits are fixed at the species level (i.e., they don't account for within-species variation). Thus, if you want to use `phyl.pca`

, you probably need to collapse your data to a single value per species, e.g. via

```
dd_agg <- aggregate(dd[,-1],by=list(dd$Taxa),mean,na.rm=TRUE)
```

Extract the numeric columns and label the rows properly so that `phyl.pca`

can match them up with the tips correctly:

```
dd_mat <- dd_agg[,-1]
rownames(dd_mat) <- dd_agg[,1]
```

Using these aggregated data, I can make up a tree (since you didn't give us one) and run `phyl.pca`

...

```
library(phytools)
tt <- rcoal(nrow(dd_agg),tip.label=dd_agg[,1])
phyl.pca(tt,dd_mat)
```

If you do need to do an analysis that takes within-species variation into account you might need to ask somewhere more specialized, e.g. the `r-sig-phylo@r-project.org`

mailing list ...