useR useR - 3 months ago 8
R Question

R: data transformation of a column in data frame

i have a data.frame as belows

> a <- c(98:103, 998:1003)
> b <- 1:length(a)
> data <- data.frame(a,b)
> data
a b
1 98 1
2 99 2
3 100 3
4 101 4
5 102 5
6 103 6
7 998 7
8 999 8
9 1000 9
10 1001 10
11 1002 11
12 1003 12


I would like to add a column based on column a.

for column a less than 100, i will assign "A" to the new column

for column a in <1000 >=100, i will assign "B" to the new column

and "C" otherwise


My approach is

> data$c <- data$a
>
> A <- 1:99
> B <- 100:999
> for (i in 1:length(a)){
+ if (data[i,1] %in% A){
+ data[i,3] <- "A"
+ } else if (data[i,1] %in% B){
+ data[i,3] <- "B"
+ } else {data[i,3] <- "C"}
+ }
> data
a b c
1 98 1 A
2 99 2 A
3 100 3 B
4 101 4 B
5 102 5 B
6 103 6 B
7 998 7 B
8 999 8 B
9 1000 9 C
10 1001 10 C
11 1002 11 C
12 1003 12 C
>


While my real data with over 500,000 rows. May i have better solution?

Answer

Find below a solution using data.table. This version might be especially useful if your key variable (here a) is not numeric.

# Set up data
a <- c(98:103, 998:1003)
b <- 1:length(a)

# Set of values to look for 
A <- 1:99
B <- 100:999

# Create data table and set key
DT <- data.table(a,b)
setkey(DT, a)

# Add new variable
DT[J(A), c:="a"]
DT[J(B), c:="b"]
DT[is.na(DT$c), c:="c"]

If your key variable is not numeric, you can change DT[J(A), c:="a"] to DT[A,c:="a"].

Comments