user2142810 - 1 year ago 119
R Question

# How to do vlookup and fill down (like in Excel) in R?

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

`VLOOKUP`
and fill.

How would I go about doing the same thing in
`R`
?

Essentially, what I have is a
`HouseType`
variable, and I need to calculate the
`HouseTypeNo`
. Here are some sample data:

``````HouseType HouseTypeNo
Semi            1
Single          2
Row             3
Single          2
Apartment       4
Apartment       4
Row             3
``````

Answer Source

If I understand your question correctly, here are four methods to do the equivalent of Excel's `VLOOKUP` and fill down using `R`:

``````# load sample data from Q
hous <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="HouseType HouseTypeNo
Semi            1
Single          2
Row             3
Single          2
Apartment       4
Apartment       4
Row             3")

# create a toy large table with a 'HouseType' column
# but no 'HouseTypeNo' column (yet)
largetable <- data.frame(HouseType = as.character(sample(unique(hous\$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)

# create a lookup table to get the numbers to fill
# the large table
lookup <- unique(hous)
HouseType HouseTypeNo
1      Semi           1
2    Single           2
3       Row           3
5 Apartment           4
``````

Here are four methods to fill the `HouseTypeNo` in the `largetable` using the values in the `lookup` table:

First with `merge` in base:

``````# 1. using base
base1 <- (merge(lookup, largetable, by = 'HouseType'))
``````

A second method with named vectors in base:

``````# 2. using base and a named vector
housenames <- as.numeric(1:length(unique(hous\$HouseType)))
names(housenames) <- unique(hous\$HouseType)

base2 <- data.frame(HouseType = largetable\$HouseType,
HouseTypeNo = (housenames[largetable\$HouseType]))
``````

Third, using the `plyr` package:

``````# 3. using the plyr package
library(plyr)
plyr1 <- join(largetable, lookup, by = "HouseType")
``````

Fourth, using the `sqldf` package

``````# 4. using the sqldf package
library(sqldf)
sqldf1 <- sqldf("SELECT largetable.HouseType, lookup.HouseTypeNo
FROM largetable
INNER JOIN lookup
ON largetable.HouseType = lookup.HouseType")
``````

If it's possible that some house types in `largetable` do not exist in `lookup` then a left join would be used:

``````sqldf("select * from largetable left join lookup using (HouseType)")
``````

Corresponding changes to the other solutions would be needed too.

Is that what you wanted to do? Let me know which method you like and I'll add commentary.