Googme Googme - 1 month ago 9
JSON Question

How to extract strings from rows which have .json like format?

I have imported a

.json
file using
library(jsonlite)
stream_in(file(".json"))


However, one of the columns still looks as a
.json
format.
Im not really sure how proceed in order to extact the columns
ID
and
email
from the
.json
column.

My example:

date <- as.Date(as.character( c("2015-02-13",
"2015-02-14",
"2015-02-14")))
ID <- c(1,2,3)
name <- c("John","Michael","Thomas")
drinks <- c("Beer","Coffee","Tee")
consumed <- c(2,5,3)
john<- "{\"employeID\":\"1\",\"other_details\":{\"email\":\"john@gmx.com\"},\"computer\":\"yes\"}"
michael<- "{\"employeID\":\"2\",\"other_details\":{\"email\":\"michael@yahoo.com\"},\"computer\":\"yes\"}"
thomas<- "{\"employeID\":\"3\",\"other_details\":{\"email\":\"thomas@gmail.com\"},\"computer\":\"yes\"}"
json <- c(john,michael,thomas)
df <- data.frame(date,ID,name,drinks,consumed,json)


Where the data.frame looks like that:

enter image description here

I would like to get the following format:

date ID name drinks consumed email computer
#1 2015-02-13 1 John Beer 2 john@gmx.com yes
#2 2015-02-14 2 Michael Coffee 5 michael@yahoo.com no
#3 2015-02-14 3 Thomas Tee 3 thomas@gmail.com yes


What I have tried was to was first to use the
library(jsonlite)
again in different variations but it always results in:

fromJSON(df$json[1])

Error: Argument 'txt' must be a JSON string, URL or file.


How can I extract these fields properly?

Answer

First, try:

ndjson::stream_in("filename.json")

The ndjson package is faster than jsonlite and was built for flattening (it's very task-specific and not as swiss-army-knife-ish as the highly useful jsonlite pkg).

Or, we can keep the tidyverse idioms all the way through:

library(tidyverse)

map_df(df$json, ~jsonlite::fromJSON(as.character(.))) %>% 
  bind_cols(select(df, -json)) %>% 
  mutate_if(is.factor, as.character) %>% 
  mutate_if(is.list, as.character) %>% 
  select(ID, name, drinks, consumed, everything())
## # A tibble: 3 × 8
##      ID    name drinks consumed computer employeID other_details.email       date
##   <dbl>   <chr>  <chr>    <dbl>    <chr>     <chr>               <chr>     <date>
## 1     1    John   Beer        2      yes         1        john@gmx.com 2015-02-13
## 2     2 Michael Coffee        5      yes         2   michael@yahoo.com 2015-02-14
## 3     3  Thomas    Tee        3      yes         3    thomas@gmail.com 2015-02-14

And, you get your character columns.