user2142810 - 2 months ago 32

R Question

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

`VLOOKUP`

How would I go about doing the same thing in

`R`

Essentially, what I have is a

`HouseType`

`HouseTypeNo`

`HouseType HouseTypeNo`

Semi 1

Single 2

Row 3

Single 2

Apartment 4

Apartment 4

Row 3

Answer

If I understand your question correctly, here are four methods to do the equivalent of Excel's `VLOOKUP`

and fill down using `R`

:

```
# load sample data from Q
hous <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="HouseType HouseTypeNo
Semi 1
Single 2
Row 3
Single 2
Apartment 4
Apartment 4
Row 3")
# create a toy large table with a 'HouseType' column
# but no 'HouseTypeNo' column (yet)
largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)
# create a lookup table to get the numbers to fill
# the large table
lookup <- unique(hous)
HouseType HouseTypeNo
1 Semi 1
2 Single 2
3 Row 3
5 Apartment 4
```

Here are four methods to fill the `HouseTypeNo`

in the `largetable`

using the values in the `lookup`

table:

First with `merge`

in base:

```
# 1. using base
base1 <- (merge(lookup, largetable, by = 'HouseType'))
```

A second method with named vectors in base:

```
# 2. using base and a named vector
housenames <- as.numeric(1:length(unique(hous$HouseType)))
names(housenames) <- unique(hous$HouseType)
base2 <- data.frame(HouseType = largetable$HouseType,
HouseTypeNo = (housenames[largetable$HouseType]))
```

Third, using the `plyr`

package:

```
# 3. using the plyr package
library(plyr)
plyr1 <- join(largetable, lookup, by = "HouseType")
```

Fourth, using the `sqldf`

package

```
# 4. using the sqldf package
library(sqldf)
sqldf1 <- sqldf("SELECT largetable.HouseType, lookup.HouseTypeNo
FROM largetable
INNER JOIN lookup
ON largetable.HouseType = lookup.HouseType")
```

If it's possible that some house types in `largetable`

do not exist in `lookup`

then a left join would be used:

```
sqldf("select * from largetable left join lookup using (HouseType)")
```

Corresponding changes to the other solutions would be needed too.

Is that what you wanted to do? Let me know which method you like and I'll add commentary.