r2evans r2evans - 3 months ago 30
JSON Question

OpenCPU and jsonlite: column-based "/json" versus row-based

Is there a clean way to change the default "/json" postfix option on data.frames to be column-based versus row-based?

Data.frames in R, if I understand correctly, are really just named lists where each list is the same length as the others. Using

jsonlite
, it's simple to show the difference (trivial example, yes):

library(jsonlite)
ll <- list(xx=1:3, yy=6:8)
dd <- data.frame(xx=1:3, yy=6:8)
toJSON(dd)
# [1] "[ { \"xx\" : 1, \"yy\" : 6 }, { \"xx\" : 2, \"yy\" : 7 }, { \"xx\" : 3, \"yy\" : 8 } ]"
toJSON(ll)
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
toJSON(dd, dataframe='column')
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
toJSON(as.list(dd))
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"


where the last three are identical. It's easy to force it to look the same either by using the
dataframe
argument to
toJSON
or by coercing the
data.frame
into a
list
.

Using OpenCPU's API, the calls look similar:

$ curl http://localhost:7177/ocpu/library/base/R/list/json -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
{
"xx" : [
1,
2,
3
],
"yy" : [
6,
7,
8
]
}

$ curl http://localhost:7177/ocpu/library/base/R/data.frame/json -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
[
{
"xx" : 1,
"yy" : 6
},
{
"xx" : 2,
"yy" : 7
},
{
"xx" : 3,
"yy" : 8
}
]


If I want the
data.frame
itself to be JSON-ified column-based then I need to coerce it to a
list
:

$ curl http://localhost:7177/ocpu/library/base/R/data.frame -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
/ocpu/tmp/x000a0fb8/R/.val
/ocpu/tmp/x000a0fb8/stdout
/ocpu/tmp/x000a0fb8/source
/ocpu/tmp/x000a0fb8/console
/ocpu/tmp/x000a0fb8/info

$ curl http://localhost:7177/ocpu/library/base/R/as.list/json -d "x=x000a0fb8"
{
"xx" : [
1,
2,
3
],
"yy" : [
6,
7,
8
]
}


Three questions:


  1. Is there a way to change the default behavior of the OpenCPU auto-JSON-ification to be column-based?

  2. Is there a reason (besides "had to default to something") that it defaults to row-based? (So that I can better understand the underpinnings and efficiencies, not meant as a challenge.)

  3. This is all academic, though, since most (if not all) libraries accepting the JSON output will understand and translate between the formats transparently. Right?



(Win7 x64, R 3.0.3, opencpu 1.2.3, jsonlite 0.9.4)

(PS: Thanks, Jeroen, OpenCPU is awesome! The more I play, the more I like.)

Answer

For dataframe objects you can use HTTP GET and set the dataframe argument:

GET http://localhost:7177/ocpu/tmp/x000a0fb8/json?dataframe=rows

For example the Boston object from the MASS package is a dataframe as well:

https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns
https://cran.ocpu.io/MASS/data/Boston/json?dataframe=rows

For HTTP GET requests to a .../json endpoint, all the http parameters are mapped to arguments in the toJSON function from the jsonlite package. You can can also specify other toJSON arguments:

https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns&digits=4

To see which arguments are available, have a look at the jsonlite manual or this post.

Note that this only works if you do the 2 step procedure: first a HTTP POST on a function that returns a dataframe, followed by retrieving that object in json format with a HTTP GET request. You can not specify toJSON parameters when you do the 1-step shortcut where you fix the POST request with /json, because in POST requests the HTTP parameters always get mapped to the function call.

The reason for this default is that the row based design seems to be the most conventional and interoperable way of encoding tabular data. The jsonlite paper/vignette goes into some more detail. Note that it also works the other way around: you don't have to call the data.frame function to create a dataframe, just posting an argument in the form:

[{"xx":1,"yy":6},{"xx":2,"yy":7},{"xx":3,"yy":8}]

will automatically turn it into a data frame:

curl https://public.opencpu.org/ocpu/library/base/R/summary/console -d object='[{"xx":1,"yy":6},{"xx":2,"yy":7},{"xx":3,"yy":8}]'