Jeff Jeff - 27 days ago 15
R Question

multiple selectInput values create unexpected dplyr (postgres) behavior

I have a lovely Shiny app which accepts selectInput values, queries a postgres DB, and outputs a graph. (It's a simple interface, but hard to reproduce here due to dplyr database connections!)

Today I changed the first selectInput value to multiple=TRUE; updated the variable being passed to the database to be %in% the list returned by the modified control; and all heck broke loose.


  1. Before I select a value the control is null, so I am greeted by the Shiny red "ERROR: RS-DBI driver ... " telling me my query "IN ()" isn't valid

  2. When only one value is selected I get a syntax error, ""locationID" IN 'A1080330' "

  3. I can solve this by putting parentheses around the list returned...
    locationID %in% (input$rtnLocid)

  4. however this creates a new "operator doesn't exist" error when I select more than one list item because of the added parentheses: " IN (('A1080330', 'B...'))



I think what's going on is the postgres driver always wants SQL list values in parentheses when using IN (some DBs may be more lenient here); adding the parentheses fixes the first selection; the added parentheses break the postgres driver again when multi-select is on.

Can anyone else using Shiny / postgres verify this behavior?

Regards,
Jeff

Update: @Steven pointed out this info link in the comments which I hadn't found when I posted: https://github.com/hadley/dplyr/issues/511

Answer

The problem is the way the query is constructed when you select only one element and use the IN operator. The dplyr translation to SQL does not add the proper parenthesis and thus, fail. This issue was discussed at length here.

One way to work around this is to pass a different instruction to filter() when length of input is equal to 1 (see example below).


Here's what is happening:

tbl(mydb, "iris") %>%
  filter(Species %in% c("setosa", "versicolor")) %>%
  .$query

Gives the proper SQL query syntax:

<Query> SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"
FROM "iris"
WHERE "Species" IN ('setosa', 'versicolor')
<PostgreSQLConnection>

And, if executed, gives the expected:

#Source: postgres 9.3.13 [elm@127.0.0.1:5432/csvdump]
#From: iris [100 x 5]
#Filter: Species %in% c("setosa", "versicolor") 
#
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          (dbl)       (dbl)        (dbl)       (dbl)   (chr)
#1           5.1         3.5          1.4         0.2  setosa
#2           4.9         3.0          1.4         0.2  setosa
#3           4.7         3.2          1.3         0.2  setosa
#4           4.6         3.1          1.5         0.2  setosa
#5           5.0         3.6          1.4         0.2  setosa
#6           5.4         3.9          1.7         0.4  setosa
#7           4.6         3.4          1.4         0.3  setosa
#8           5.0         3.4          1.5         0.2  setosa
#9           4.4         2.9          1.4         0.2  setosa
#10          4.9         3.1          1.5         0.1  setosa
#..          ...         ...          ...         ...     ...

Let see what happens if you try to pass a single element:

tbl(mydb, "iris") %>%
  filter(Species %in% "setosa") %>%
  .$query

The query will be:

<Query> SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"
FROM "iris"
WHERE "Species" IN 'setosa'
<PostgreSQLConnection>

Which, if executed, will result in the following error:

Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "'setosa'" LINE 3: WHERE "Species" IN 'setosa') AS "master" ^ )

That's because for a single element, the dplyr translation to SQL query doesn't add the proper parenthesis. Notice how it's 'setosa' instead of ('setosa').

To circumvent that, we can do:

if(length(input$Species) == 1) { 
  tbl(mydb, "iris") %>% 
    filter(Species == input$Species) %>% 
}

Which will build a syntactically valid SQL query:

<Query> SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" 
FROM "iris" 
WHERE "Species" = 'setosa' 
<PostgreSQLConnection>

The following example works around this issue. Here I simply instruct the app to pass filter(Species == ...) if input$Species is of length 1 and filter(Species %in% ...) otherwise.

enter image description here


ShinyApp

server <- function(input, output) {

  selectedQuery <- reactive({

    if(length(input$Species) == 1) { 
      tbl(mydb, "iris") %>% 
        filter(Species == input$Species) %>% 
        .$query
    }
    else(
      tbl(mydb, "iris") %>% 
        filter(Species %in% input$Species) %>% 
        .$query
      )

  })

  selectedData <- reactive({

    if(length(input$Species) == 1) {
      tbl(mydb, "iris") %>% 
        filter(Species == input$Species) %>% 
        data.frame
    }
    else(
      tbl(mydb, "iris") %>% 
        filter(Species %in% input$Species) %>% 
        data.frame
      )
  })

  output$plot <- renderPlot({
    ggplot2::qplot(Sepal.Length, Petal.Length, data = selectedData(), color = Species)
  })

  output$query <- renderPrint({
    selectedQuery()
    })
}

ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      selectInput("Species", "Species", 
                  tbl(mydb, "iris") %>% 
                    data.frame %>% 
                    .$Species %>% 
                    unique, 
                  selected = "setosa", multiple = TRUE)
    ),
    mainPanel(
      textOutput("query"),
      plotOutput("plot")
      )
  )
)

shinyApp(ui = ui, server = server)
Comments