Lasitha Yapa Lasitha Yapa - 2 months ago 10
R Question

Similar function to SQL 'WHERE' clause in R

I have a data set assigned to a variable named 'temps', which have columns 'date', 'temperature', 'country'.

I want to do something like this, which I can do in SQL

SELECT * FROM temps WHERE country != 'mycountry'


How can I do similar selection in R?

Answer

We can use similar syntax in base R

temps[temps$country != "mycountry",]

Benchmarks

set.seed(24)
temps1 <- data.frame(country = sample(LETTERS, 1e7, replace=TRUE),
                  val = rnorm(1e7))
system.time(temps1[!temps1$country %in% "A",])
#  user  system elapsed 
#   0.92    0.11    1.04 
system.time(temps1[temps1$country != "A",])
#   user  system elapsed 
#   0.70    0.17    0.88 

If we are using package solutions

library(sqldf)
system.time(sqldf("SELECT * FROM temps1 WHERE country != 'A'"))
#   user  system elapsed 
# 12.78    0.37   13.15 

library(data.table)
system.time(setDT(temps1, key = 'country')[!("A")])
#   user  system elapsed 
#  0.62    0.19    0.37