Jaime Caffarel Jaime Caffarel - 2 months ago 14
JSON Question

How to remove \" when writing R character objects to JSON

I have a

data.table
like this one:

test <- data.table(city = c("Berlin", "Berlin", "Berlin", "Amsterdam", "Amsterdam"),
key1 = c("A", "A", "A", "B", "B"),
value1 = c(1, 2, 3, 4, 5),
value2 = c(0.1, 0.2, 0.3, 0.4, 0.5),
kpi = c(10, 15, 20, 25, 30))


I would like to upload these data to Elasticsearch, but with a specific structure:

library(RJSONIO)
res <-test[, .(factors = toJSON(.SD)),
by = .(city, key1),
.SDcols = c("value1", "kpi")]


This code creates, in the column
factors
, different JSONs. Since I want to get rid of the
\n
sequences that the library introduces, I can substitute these strings in the assignment:

res <-test[, .(factors = gsub("\n", "", toJSON(.SD))),
by = .(city, key1),
.SDcols = c("value1", "kpi")]


The problem comes when I want to upload this object to Elasticsearch (I'm using the
elastic
package). Since R uses backslashes to escape double quotes within a character string, when I write the object using:

docs_bulk(res, "index")


It writes
\"
instead of
"
in the string fields that were created using the inner
toJSON
(
value1
and
kpi
). This can also be checked when writing the object into a file:

write(toJSON(res), "~/output.json")

{
"city": [ "Berlin", "Amsterdam" ],
"key1": [ "A", "B" ],
"factors": [ "{ \"value1\": [1, 2, 3 ],\"kpi\": [10, 15, 20 ] }", "{ \"value1\": [ 4, 5 ],\"kpi\": [25, 30 ] }" ]
}


Since the names of the
value1
and
kpi
start and end with
\"
, these fields are not parsed by Elasticsearch as separated arrays. What I would want to have is something like this:

{
"city": [ "Berlin", "Amsterdam" ],
"key1": [ "A", "B" ],
"factors": [ { "value1": [1, 2, 3 ],"kpi": [10, 15, 20 ] }, { "value1": [4, 5 ],"kpi": [25, 30 ] } ]
}


I have tried several different combinations of regexs
gsub
but I'm unable to prevent R from writing the backslash. My last option would be to write the object to a file and parse it manually with
sed
, but I think there should be an easier way. Any help would be greatly appreciated.

Answer

okay, i think this should do it. could be less code to get to the final res object to load in bulk, but anyway

library(elastic)
library(data.table)
library(jsonlite)

test <- data.table(city = c("Berlin", "Berlin", "Berlin", "Amsterdam", "Amsterdam"),
                   key1 = c("A", "A", "A", "B", "B"),
                   value1 = c(1, 2, 3, 4, 5),
                   value2 = c(0.1, 0.2, 0.3, 0.4, 0.5),
                   kpi = c(10, 15, 20, 25, 30))

res <- test[, .(factors = jsonlite::toJSON(.SD, dataframe = "columns")), 
           by = .(city, key1), 
           .SDcols = c("value1", "kpi")]

res <- lapply(apply(res, 1, as.list), function(z) {
  tt <- z[!names(z) %in% "factors"]
  tt$factors <- fromJSON(z$factors)
  tt
})

docs_bulk(res, "mycoolindex")

curl 'http://localhost:9200/mycoolindex/_search?size=1' | jq .
#> {
#>   "took": 13,
#>   "timed_out": false,
#>   "_shards": {
#>     "total": 5,
#>     "successful": 5,
#>     "failed": 0
#>   },
#>   "hits": {
#>     "total": 2,
#>     "max_score": 1,
#>     "hits": [
#>       {
#>         "_index": "mycoolindex",
#>         "_type": "mycoolindex",
#>         "_id": "AVeay0KnlE0U0vVWYXkb",
#>         "_score": 1,
#>         "_source": {
#>           "city": [
#>             "Amsterdam"
#>           ],
#>           "key1": [
#>             "B"
#>           ],
#>           "factors": {
#>             "value1": [
#>               4,
#>               5
#>             ],
#>             "kpi": [
#>               25,
#>               30
#>             ]
#>           }
#>         }
#>       }
#>     ]
#>   }
#> }