jsta jsta - 1 month ago 20
R Question

Copy a list of data.frame(s) to sqlite database using dplyr

I want to make an sqlite database from a list of data.frame(s) using the dplyr package. It looks like the

dplyr::copy_to
function is what I need to use. I think the problem I am having is related to NSE. Also see the vignette on
dplyr
with databases
.

data(iris)
data(cars)

res <- list("iris" = iris, "cars" = cars)

my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
create = TRUE)

lapply(res, function(x) dplyr::copy_to(my_db, x))



Error: Table x already exists.

Answer

The reason for this is because the default table name is based off of the name of the data frame in R. When using lapply, it does not take the index name.

The documentation for dplyr::copy_to.src_sql contains:

## S3 method for class 'src_sql'
copy_to(dest, df, name = deparse(substitute(df)),
  types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL,
  analyze = TRUE, ...)

The line name = deparse(substitute(df)) shows where the table name comes from.

We can see what that turns into:

res <- list("iris" = iris, "cars" = cars)
tmp = lapply(res, function(x) print(deparse(substitute(x))))
#> [1] "X[[i]]"
#> [1] "X[[i]]"

The name for the table in the SQLite source is X[[i]]; once the first line has been executed, the table already exists.

We can resolve by either using an explicit for loop and passing the name secondarily, or by using lapply on an index number.

For instance:

res <- list("iris" = iris, "cars" = cars)
my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                           create = TRUE)
lapply(seq_along(res), function(i, l){dplyr::copy_to(my_db, l[[i]], names(l)[[i]])}, l = res)
my_db %>% tbl("iris") %>% head

#> Source:   query [?? x 5]
#> Database: sqlite 3.8.6 
#> 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

my_db %>% tbl("cars") %>% head

#> Source:   query [?? x 2]
#> Database: sqlite 3.8.6 
#> 
#>   speed  dist
#>   <dbl> <dbl>
#> 1     4     2
#> 2     4    10
#> 3     7     4
#> 4     7    22
#> 5     8    16
#> 6     9    10