Javier Castillo Arnemann Javier Castillo Arnemann - 3 months ago 13
R Question

Assigning multiple matrix values to single column of data frame

I have the following data frame with info for 163 monkeys:

> head(vervetdf)
ucla_id country species Gender pi
1 A8516_M_2 Barbados Chlorocebus sabaeus M NA
2 AG23_F_10 Tanzania Chlorocebus pygerythrus pygerythrus F NA
3 AG5417_F_10 Tanzania Chlorocebus pygerythrus pygerythrus F NA
4 AGM126_F_1 Central African Republic Chlorocebus tantalus F NA
5 AGM127_F_1 Central African Republic Chlorocebus tantalus F NA
6 AGM129_F_1 Central African Republic Chlorocebus tantalus F NA

> str(vervetdf)
'data.frame': 163 obs. of 5 variables:
$ ucla_id: Factor w/ 163 levels "A8516_M_2","AG23_F_10",..: 1 2 3 4 5 6 7 8 9 10 ...
$ country: Factor w/ 12 levels "Barbados","Botswana",..: 1 11 11 3 3 3 3 3 3 3 ...
$ species: Factor w/ 5 levels "Chlorocebus aethiops aethiops",..: 4 3 3 5 5 5 5 5 5 5 ...
$ Gender : Factor w/ 2 levels "F","M": 2 1 1 1 1 1 1 2 1 2 ...
$ pi : logi NA NA NA NA NA NA ...


I need to add the pi values for each monkey for analysis and plotting, so I created the new column pi. Pi is the same for all monkeys of the same species (I have 5 species), but is calculated in windows, so there are 1300 pi values for each monkey. I have a matrix with the pi values for each species:

> head(corrected_pi)
pi1 pi2 pi3 pi4 pi5
w1.ce 0.001918322 0.002408772 0.002306475 0.002086117 0.002501300
w2.ce 0.002125624 0.002779025 0.002620691 0.002599817 0.002847614
w3.ce 0.001512895 0.001886345 0.001867847 0.001658217 0.001875594
w4.ce 0.002340536 0.002637327 0.002736944 0.002252872 0.002848985
w5.ce 0.001329015 0.001553925 0.001654385 0.001654023 0.001806535
w6.ce 0.001326739 0.001595000 0.001487649 0.001417510 0.001581388

> dim(corrected_pi)
[1] 1300 5


So, is there a way I can assign all the pi values to the corresponding species in just one column of the data frame?

Answer

You can list all pi values for a species in one column using nest from the tidyr package. Then use merge to join the new pi table with vervetdf. Here, we assume that you did not yet create the NA column for vervetdf$pi as the merge will do that for you:

library(tidyr)
new.pi <- nest(data.frame(species=factor(levels(vervetdf$species), levels=levels(vervetdf$species)), t(corrected.pi)), -species, .key=pi)
result <- merge(vervetdf, new.pi, by="species", sort=FALSE)

Given the limited (only 6 rows of corrected.pi) data you posted:

print(result)
##                              species     ucla_id                  country Gender                                                                           pi
##1                 Chlorocebus sabaeus   A8516_M_2                 Barbados      M 0.002306475, 0.002620691, 0.001867847, 0.002736944, 0.001654385, 0.001487649
##2 Chlorocebus pygerythrus pygerythrus   AG23_F_10                 Tanzania      F 0.002408772, 0.002779025, 0.001886345, 0.002637327, 0.001553925, 0.001595000
##3 Chlorocebus pygerythrus pygerythrus AG5417_F_10                 Tanzania      F 0.002408772, 0.002779025, 0.001886345, 0.002637327, 0.001553925, 0.001595000
##4                Chlorocebus tantalus  AGM126_F_1 Central African Republic      F 0.002086117, 0.002599817, 0.001658217, 0.002252872, 0.001654023, 0.001417510
##5                Chlorocebus tantalus  AGM127_F_1 Central African Republic      F 0.002086117, 0.002599817, 0.001658217, 0.002252872, 0.001654023, 0.001417510
##6                Chlorocebus tantalus  AGM129_F_1 Central African Republic      F 0.002086117, 0.002599817, 0.001658217, 0.002252872, 0.001654023, 0.001417510

Notes:

  1. new.pi is a data frame with 5 rows, one for each of your species.
  2. new.pi is a data frame with two columns:
    • species: which is a factor created using the levels of the vervetdf$species column. This allows us to join the two tables later.
    • pi: which is created by nest. Note that nest creates a new column, named after the .key parameter, that is a list of values from the columns that are nested. The first argument to nest is the data frame from which columns are to be nested. Here, we construct a temporary data frame that is the species column plus all the rows of corrected.pi (i.e., t(corrected.pi)). Then we select all columns except for the species column to nest (i.e., -species)