JerryN JerryN - 1 month ago 11
JSON Question

convert opensensors json file to data frame (or table)

A low cost air quality sensor (AQE) sends its data to the opensensors.io server. Every x seconds it sends a string of information (timestamp, pollutant concentration, etc). The data can be retrieved structured as a json file. Opensensors terminology uses

devices
,
topics
,
organizations
, and
payloads
. I have figured out how to set up a curl handle and use the curl package to download a csv file. Here's the code

curl_download(url = myURL2, destfile = "curlDownloadTest.csv", mode = "w", handle = myCurlHandle)


An example of the downloaded data is at https://github.com/GeraldCNelson/AQEAnalysis/commit/c6ee29545d07835c5a920bf2b37625adb78462aa

I use
fromJSON
in the jsonlite package to transform this

temp <- fromJSON("curlDownloadTest.csv", simplifyDataFrame = FALSE)


The output (
temp
) is a large list with 2 elements - messages and next.
messages
contains all the data;
next
is a link to use to get the next set of data (it's not all downloaded at once).

The messages list consists of multiples sets of lists (one for each set of uploaded data); each set has five elements -
device
,
owner
,
topic
,
date
, and
payload
. Payload is a list of 3 -
encoding
(always chr utf-8),
content-type
(always chr "application/json"), and
text
. The text list looks like its in json format (here's a string fragment - "{\"serial-number\":\"egg00802aaa019b0111\",\"converted-value\":69.52,\"converted-units\":\"degF\")

I want to restructure this data into a data frame (or data table) that has
the date information as a column and the test information from payload as the remaining columns (
serial-number
,
converted value
, etc...

I can't figure out how to convert the text list in the payload list from its current (json?) structure to something I can rbind to a data frame.

Answer

Thankfully, everything is pretty uniform:

library(jsonlite)
library(dplyr)

df <- fromJSON("curlDownloadTest.csv")

bind_cols(
  select(df$messages, device, owner, topic, date),
  stream_in(textConnection(df$messages$payload$text), flatten=TRUE)
) -> df

glimpse(df)
## Observations: 742
## Variables: 14
## $ device             <chr> "egg00802aaa019b0111", "egg00802aaa019b0111", "egg00802aaa019b0111", "...
## $ owner              <chr> "wickeddevice", "wickeddevice", "wickeddevice", "wickeddevice", "wicke...
## $ topic              <chr> "/orgs/wd/aqe/temperature/egg00802aaa019b0111", "/orgs/wd/aqe/humidity...
## $ date               <chr> "2016-10-10T17:02:09.507Z", "2016-10-10T17:02:09.811Z", "2016-10-10T17...
## $ serial-number      <chr> "egg00802aaa019b0111", "egg00802aaa019b0111", "egg00802aaa019b0111", "...
## $ converted-value    <dbl> 63.20, 43.31, 0.52, -25.20, 63.70, 42.85, 0.53, -13.32, 64.01, 42.58, ...
## $ converted-units    <chr> "degF", "percent", "ppb", "ppb", "degF", "percent", "ppb", "ppb", "deg...
## $ raw-value          <dbl> 63.200000, 43.310000, 0.221252, -0.827832, 63.700000, 42.850000, 0.221...
## $ raw-instant-value  <dbl> 63.48000, 43.07000, 0.22149, -0.82785, 63.91000, 42.66000, 0.22073, -0...
## $ raw-units          <chr> "degF", "percent", "volt", "volt", "degF", "percent", "volt", "volt", ...
## $ sensor-part-number <chr> "SHT25", "SHT25", "NO2-B4-ISB", "3SP-O3-20-PCB", "SHT25", "SHT25", "NO...
## $ raw-value2         <dbl> NA, NA, 0.222732, NA, NA, NA, 0.222797, NA, NA, NA, 0.222460, NA, NA, ...
## $ raw-instant-value2 <dbl> NA, NA, 0.22330, NA, NA, NA, 0.22273, NA, NA, NA, 0.22341, NA, NA, NA,...
## $ compensated-value  <dbl> NA, NA, 0.62, -25.25, NA, NA, 0.63, -13.37, NA, NA, 0.02, -18.08, NA, ...