WorkHardWork WorkHardWork - 3 months ago 91
JSON Question

USql Call data in multidimensional JSON array

I have this JSON file in a data lake that looks like this:

{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
"@class":"PublisherV2_0",
"name":"Company",
"id":"2163171",
"profileIcon":"https://pbs.twimg.com/image",
"profileLocation":{
"@class":"DocumentLocation",
"locality":"Toronto",
"adminDistrict":"ON",
"countryRegion":"Canada",
"coordinates":{
"latitude":43.7217,
"longitude":-31.432},
"quadKey":"000000000000000"},
"displayName":"Name",
"externalId":"00000000000"},
"source":{
"name":"blogs",
"id":"18",
"param":"Twitter"},
"content":{
"text":"Description of post"},
"language":{
"name":"English",
"code":"en"},
"abstracttext":"More Text and links",
"score":{}
}
}


in order to call the data into my application, I have to turn the JSON into a string using this code:

DECLARE @input string = @"/MSEStream/{*}.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];


@allposts =
EXTRACT
jsonString string
FROM @input
USING Extractors.Text(delimiter:'\b', quoting:true);

@extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM @allposts;


@result =
SELECT er["id"] AS postID,
er["contenttype"] AS contentType,
er["posttype"] AS postType,
er["uri"] AS uri,
er["title"] AS Title,
er["acquisitiondate"] AS acquisitionDate,
er["modificationdate"] AS modificationDate,
er["publicationdate"] AS publicationDate,
er["profile"] AS profile
FROM @extractedrows;

OUTPUT @result
TO "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();


This output the JSON into a .csv file that is readable and when I download the file all data is displayed properly. My problem is when I need to get the data inside profile. Because the JSON is now a string I can't seem to extract any of that data and put it into a variable to use. Is there any way to do this? or do I need to look into other options for reading the data?

Answer

You can use JsonTuple on the profile string to further extract the specific properties you want. An example of U-SQL code to process nested Json is provided in this link - https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql.

You can use JsonTuple on the profile column to further extract specific nodes

E.g. use JsonTuple to get all the child nodes of the profile node and extract specific values like how you did in your code.

@childnodesofprofile = 
SELECT
   Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile) AS childnodes_map
FROM @result;

@values =
SELECT
   childnodes_map["name"] AS name,
   childnodes_map["id"] AS id
FROM @result;

Alternatively, if you are interested in specific values, you can also pass paramters to the JsonTuple function to get the specific nodes you want. The code below gets the locality node from the recursively nested nodes (as described by the "$..value" construct.

@locality = 
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile, "$..locality").Values AS locality
FROM @result;

Other supported constructs by JsonTuple

    JsonTuple(json, "id", "name")              // field names          
    JsonTuple(json, "$.address.zip")           // nested fields        
    JsonTuple(json, "$..address")              // recursive children   
    JsonTuple(json, "$[?(@.id > 1)].id")       // path expression      
    JsonTuple(json)                            // all children

Hope this helps.