Oli Paul Oli Paul - 1 month ago 9
JSON Question

R: read and parse Json

If R is not suitable for this job then fair enough but I believe it should be.

I am calling an API, then dumping the results into Postman json reader. Then I get results like:

"results": [
{
"personUuid": "***",
"synopsis": {
"fullName": "***",
"headline": "***",
"location": "***",
"image": "***",
"skills": [
"*",
"*",
"*",
"*.",
"*"
],
"phoneNumbers": [
"***",
"***"
],
"emailAddresses": [
"***"
],
"networks": [
{
"name": "linkedin",
"url": "***",
"type": "canonicalUrl",
"lastAccessed": null
},
{
"name": "***",
"url": "***",
"type": "cvUrl",
"lastAccessed": "*"
},
{
"name": "*",
"url": "***",
"type": "cvUrl",
"lastAccessed": "*"
}
]
}
},
{


Firstly I'm not sure on how to import this into R as I've mainly dealt with csv's. I've seen other questions where people use Json packages to call the URL directly but that's not going to work with what I'm doing so I'd like to know how to read a csv with json in it.

I used:

x <- fromJSON(file="Z:/json.csv")


But perhaps theres a better way. Once this is done the json looks more like:

...$results[[9]]$synopsis$emailAddresses
[1] "***" "***"
[3] "***" "***"

$results[[9]]$synopsis$networks...


Then what I would like for each result is to store the headline and then email address into a data table.

I tried:

str_extract_all(x, 'emailAddresses*$')


However I figured * would represent everything between emailAddresses and the $ including new lines etc, however this doesn't work. I also find with extract when you do get * to work, it doesnt extract what * represents.

eg:

> y <- 'some text. email "oli@oli.o" other text'
> y
[1] "some text. email \"oli@oli.o\" other text"
> str_extract_all(y, 'email \"*"')
[[1]]
[1] "email \""

Answer

UPDATE 1: to read the json from a URL you can simply use the fromJSON function, passing the string with your json data url:

library(jsonlite)

url <- 'http://you.url.com/data.json'

# in this case we pass an URL to the fromJSON function instead of the actual content we want to parse
fromJSON(url, flatten=TRUE)$results[c("synopsis.headline", "synopsis.emailAddresses")] 

// end UPDATE 1

you could also pass the flatten param to fromJSON and then use the 'results' dataframe.

fromJSON(json.data, flatten=TRUE)$results[c("synopsis.headline",  
                                            "synopsis.emailAddresses")]

synopsis.headline synopsis.emailAddresses
1               ***        jane.doe@boo.com
2               ***        john.doe@foo.com

here is how I defined json.data, please note I intentionally added 1 more record to your sample input json.

 json.data <- '{
      "results":[  
        {  
          "personUuid":"***",
          "synopsis":{  
            "fullName":"***",
            "headline":"***",
            "location":"***",
            "image":"***",
            "skills":[  
              "*",
              "*",
              "*",
              "*.",
              "*"
              ],
            "phoneNumbers":[  
              "***",
              "***"
              ],
            "emailAddresses":[  
              "jane.doe@boo.com"
              ],
            "networks":[  
              {  
                "name":"linkedin",
                "url":"***",
                "type":"canonicalUrl",
                "lastAccessed":null
              },
              {  
                "name":"***",
                "url":"***",
                "type":"cvUrl",
                "lastAccessed":"*"
              },
              {  
                "name":"*",
                "url":"***",
                "type":"cvUrl",
                "lastAccessed":"*"
              }
              ]
          }
        },
        {  
          "personUuid":"***",
          "synopsis":{  
            "fullName":"***",
            "headline":"***",
            "location":"***",
            "image":"***",
            "skills":[  
              "*",
              "*",
              "*",
              "*.",
              "*"
              ],
            "phoneNumbers":[  
              "***",
              "***"
              ],
            "emailAddresses":[  
              "john.doe@foo.com"
              ],
            "networks":[  
              {  
                "name":"linkedin",
                "url":"***",
                "type":"canonicalUrl",
                "lastAccessed":null
              },
              {  
                "name":"***",
                "url":"***",
                "type":"cvUrl",
                "lastAccessed":"*"
              },
              {  
                "name":"*",
                "url":"***",
                "type":"cvUrl",
                "lastAccessed":"*"
              }
              ]
          }
        }
        ]
    }'
Comments