MathsQuant525 MathsQuant525 - 4 months ago 16
R Question

R merge() by column names in one data frame, and a column in the other

I am trying to use the merge function in R. I have a list of column names (Stock names) in some data frame D. I then have another data frame 'Info' where these Stock names are stored in a column, with a corresponding sector name in the next column. I need to merge these sector names to a row in D.

If I were merging on 2 columns, I could use merge() directly and alter by.x and by.y and the emrged values would appear as extra columns. However, D has the name I am merging by as its column names, so I essentially want it to add on extra rows? The following code didn't work:

M<-merge(D,Info[,c("Name","Sector")],by.x=colnames(D), by.y=c("Name"))


P.S I have searched around on SO, but nobody has precisely this problem. They have similar ones, but those approaches are not valid for this.

Answer

From your description, D looks like

set.seed(1)
D <- data.frame(matrix(runif(9), nc=3))
names(D) <- sample(LETTERS, 3)

> D
          B         F         E
1 0.2655087 0.9082078 0.9446753
2 0.3721239 0.2016819 0.6607978
3 0.5728534 0.8983897 0.6291140

Where in this example, B, F, and E are stock names. Presumably info looks like this:

info <- data.frame(NAME=LETTERS, sector=letters)
> info
   NAME sector
1     A      a
2     B      b
3     C      c
4     D      d
5     E      e
6     F      f
7     G      g
..snip..

The idea is to extract the right sector from info, then rbind that vector to D. Note that rbind matches the names of the first and second objects.

merged <- rbind(D, sapply(names(D), function(x) info$sector[info$NAME==x]))
> merged
          B         F         E
1 0.2655087 0.9082078 0.9446753
2 0.3721239 0.2016819 0.6607978
3 0.5728534 0.8983897 0.6291140
4 2.0000000 6.0000000 5.0000000

sector (which was a factor type) has been converted to a float. To make the merged table look nice, you could do

rbind(format(D),
  as.character(
    sapply(names(D), function(x) info$sector[info$NAME==x])
  )
)
          B         F         E
1 0.2655087 0.9082078 0.9446753
2 0.3721239 0.2016819 0.6607978
3 0.5728534 0.8983897 0.6291140
4         b         f         e

But now your numbers are really characters, which may or may not be useful for you.