abhiieor abhiieor - 2 months ago 13
R Question

data.table/data.frame rbind not working fine

I am using data.table 1.9.6. Here is some simple code and output:

> df <- data.table(a=c(NA,NA,2,2),b=c(1,1,2,2))
> nrow(df[is.na(a)]) + nrow(df[!is.na(a)])
[1] 4
> nrow(rbind(df[is.na(a)],df[!is.na(a)]))
[1] 4
> nrow(rbind(df[is.na(a),b := a],df[!is.na(a)]))
[1] 6
> rbind(df[is.na(a),b := a],df[!is.na(a)])
a b
1: NA NA
2: NA NA
3: 2 2
4: 2 2
5: 2 2
6: 2 2
> rbind(df[is.na(a),a := b],df[!is.na(a)])
a b
1: NA NA
2: NA NA
3: 2 2
4: 2 2
5: 2 2
6: 2 2


essentially just
rbind
after
is.na()
and
!is.na()
gives me fine results but as soon as I try to replace
NA
values in column with other column value
rbind(df[is.na(a),a := b],df[!is.na(a)])
something breaks. Rather illogical
rbind(df[is.na(a),b := a],df[!is.na(a)])
also breaks. Can anyone explain what I am missing or is this a bug?

Further to keep things moving I tried:

> rbind(data.frame(df[is.na(a),a := b]),data.frame(df[!is.na(a)]))
a b
1 NA NA
2 NA NA
3 2 2
4 2 2
5 2 2
6 2 2


So this doesn't work even after I convert it to
data.frame
.

Answer

The problem is that you use := which updates in-place with the condition, but returns the whole dataset ignoring the condition.

Either use this syntax to not update in-place :

rbind(df[is.na(a),.(a,b = a)],df[!is.na(a)])
    a  b
1: NA NA
2: NA NA
3:  2  2
4:  2  2

Or this to only update in-place

df[is.na(a),b := a]
df
    a  b
1: NA NA
2: NA NA
3:  2  2
4:  2  2