CREW CREW - 6 months ago 34
Perl Question

How would I read this data structure in Perl? Dictionary/Hash with keys containing lists containing lists. Python::Inline giving me errors

I've been struggling for about 3 weeks on this simple issue. I can't understand why and I would give anything to solve it lol.

I am trying to read values from the data structure below. The docs say it's a dictionary with keys containing lists of results of that type.

Example: I get the master query reply using an eval function. I lookup the key "song_hits" to get that structure. Then I lookup the key 'track' and parse it. The problem is getting to the 'track' part.

When I do it from how Perl docs tell me to, I get Can't locate object method "FIRSTKEY" via package "Inline::Python::Object::Data".

So I'm wondering if there's a way to read the value using the eval function to bypass ObjectData's hash key limitation, another way to read it given I know exact keys, or if I'm just doing this entirely wrong.

{
'album_hits': [
{
'album':
{
'albumArtRef': 'http://lh5.ggpht.com/DVIg4GiD6msHfgPs_Vu_2eRxCyAoz0fF...',
'albumArtist': 'J.Cole',
'albumId': 'Bfp2tuhynyqppnp6zennhmf6w3y',
'artist': 'J.Cole',
'artistId': ['Ajgnxme45wcqqv44vykrleifpji'],
'description_attribution':
{
'kind': 'sj#attribution',
'license_title': 'Creative Commons Attribution CC-BY',
'license_url': 'http://creativecommons.org/licenses/by/4.0/legalcode',
'source_title': 'Freebase',
'source_url': ''
},
'explicitType': '1',
'kind': 'sj#album',
'name': 'Work Out',
'year': 2011
},
'type': '3'
}],
'artist_hits': [
{
'artist':
{
'artistArtRef': 'http://lh3.googleusercontent.com/MJe-cDw9uQ-pUagoLlm...',
'artistArtRefs': [
{
'aspectRatio': '2',
'autogen': False,
'kind': 'sj#imageRef',
'url': 'http://lh3.googleusercontent.com/MJe-cDw9uQ-pUagoLlmKX3x_K...'
}],
'artistId': 'Ajgnxme45wcqqv44vykrleifpji',
'artist_bio_attribution':
{
'kind': 'sj#attribution',
'source_title': 'David Jeffries, Rovi'
},
'kind': 'sj#artist',
'name': 'J. Cole'
},
'type': '2'
}],
'playlist_hits': [
{
'playlist':
{
'albumArtRef': [
{
'url': 'http://lh3.googleusercontent.com/KJsAhrg8Jk_5A4xYLA68LFC...'
}],
'description': 'Workout Plan ',
'kind': 'sj#playlist',
'name': 'Workout',
'ownerName': 'Ida Sarver',
'shareToken': 'AMaBXyktyF6Yy_G-8wQy8Rru0tkueIbIFblt2h0BpkvTzHDz-fFj6P...',
'type': 'SHARED'
},
'type': '4'
}],
'situation_hits': [
{
'situation':
{
'description': 'Level up and enter beast mode with some loud, aggressive music.',
'id': 'Nrklpcyfewwrmodvtds5qlfp5ve',
'imageUrl': 'http://lh3.googleusercontent.com/Cd8WRMaG_pDwjTC_dSPIIuf...',
'title': 'Entering Beast Mode',
'wideImageUrl': 'http://lh3.googleusercontent.com/8A9S-nTb5pfJLcpS8P...'
},
'type': '7'
}],
'song_hits': [
{
'track':
{
'album': 'Work Out',
'albumArtRef': [
{
'aspectRatio': '1',
'autogen': False,
'kind': 'sj#imageRef',
'url': 'http://lh5.ggpht.com/DVIg4GiD6msHfgPs_Vu_2eRxCyAoz0fFdxj5w...'
}],
'albumArtist': 'J.Cole',
'albumAvailableForPurchase': True,
'albumId': 'Bfp2tuhynyqppnp6zennhmf6w3y',
'artist': 'J Cole',
'artistId': ['Ajgnxme45wcqqv44vykrleifpji', 'Ampniqsqcwxk7btbgh5ycujij5i'],
'composer': '',
'discNumber': 1,
'durationMillis': '234000',
'estimatedSize': '9368582',
'explicitType': '1',
'genre': 'Pop',
'kind': 'sj#track',
'nid': 'Tq3nsmzeumhilpegkimjcnbr6aq',
'primaryVideo':
{
'id': '6PN78PS_QsM',
'kind': 'sj#video',
'thumbnails': [
{
'height': 180,
'url': 'https://i.ytimg.com/vi/6PN78PS_QsM/mqdefault.jpg',
'width': 320
}]
},
'storeId': 'Tq3nsmzeumhilpegkimjcnbr6aq',
'title': 'Work Out',
'trackAvailableForPurchase': True,
'trackAvailableForSubscription': True,
'trackNumber': 1,
'trackType': '7',
'year': 2011
},
'type': '1'
}],
'station_hits': [
{
'station':
{
'compositeArtRefs': [
{
'aspectRatio': '1',
'kind': 'sj#imageRef',
'url': 'http://lh3.googleusercontent.com/3aD9mFppy6PwjADnjwv_w...'
}],
'contentTypes': ['1'],
'description': 'These riff-tastic metal tracks are perfect for getting the blood pumping.',
'imageUrls': [
{
'aspectRatio': '1',
'autogen': False,
'kind': 'sj#imageRef',
'url': 'http://lh5.ggpht.com/YNGkFdrtk43e8H941fuAHjflrNZ1CJUeqdoys...'
}],
'kind': 'sj#radioStation',
'name': 'Heavy Metal Workout',
'seed':
{
'curatedStationId': 'Lcwg73w3bd64hsrgarnorif52r',
'kind': 'sj#radioSeed',
'seedType': '9'
},
'skipEventHistory': [],
'stationSeeds': [
{
'curatedStationId': 'Lcwg73w3bd64hsrgarnorif52r',
'kind': 'sj#radioSeed',
'seedType': '9'
}]
},
'type': '6'
}],
'video_hits': [
{
'score': 629.6226806640625,
'type': '8',
'youtube_video':
{
'id': '6PN78PS_QsM',
'kind': 'sj#video',
'thumbnails': [
{
'height': 180,
'url': 'https://i.ytimg.com/vi/6PN78PS_QsM/mqdefault.jpg',
'width': 320
}],
'title': 'J. Cole - Work Out'
}
}]


}

Cleaned, but broken code with 3 weeks of different attempts: (I have tried for, foreach, while, but the furthest it would read would be either the entire unicode array, error, or an empty string)

sub search {
my $query = shift;

my $uri = 'googlemusic:search:' . $query;

if (my $result = $cache->get($uri)) {
return $result;
}

my $googleResult;
my $result = {
tracks => [],
albums => [],
artists => [],
};

eval {
$googleResult = $googleapi->search($query, $prefs->get('max_search_items'));
};
if ($@) {
$log->error("Not able to search All Access for \"$query\": $@");
return;
}
#gives not an ARRAY refernce error
for my $hit (@{$googleResult->{song_hits}}) {
push @{$result->{tracks}}, to_slim_track($hit->{track});
}
#works, but gives an error on the next line, 'newlist' object has no attribute 'album'
for my $hit ({$googleResult->{album_hits}}) {
push @{$result->{albums}}, album_to_slim_album($hit->{album});
}
#Perl and others recommended way, but gives Can't locate object method "FIRSTKEY" via package "Inline::Python::Object::Data"
for my $hit (%{$googleResult->{artist_hits}}) {
push @{$result->{artists}}, artist_to_slim_artist($hit->{artist});
}

# Add to the cache
$cache->set($uri, $result, $CACHE_TIME);

return $result;
}


I have tried reading up, but have gotten so many errors including:


  • 'key' does not exist

  • Can't use string ("track") as a HASH ref while strict refs in use

  • Type of argument to keys on reference must be unblessed hashref or arrayref



My Full Test File: http://pastebin.com/DMnDc56i
GoogleApi PM (Python GAPI Hook): https://raw.githubusercontent.com/hechtus/squeezebox-googlemusic/master/GoogleMusic/GoogleAPI.pm

Edit: Info, There were a couple of people who wanted unmaintained old code fixed, so I offered to help and got everything working besides this part.

Old Code Git: https://github.com/hechtus/squeezebox-googlemusic

Google Api Python I use: https://github.com/simon-weber/gmusicapi

Answer

Update   Added a comment on invalid JSON at the end.


I take it that the data structure shown is in $googleResult. This is 'almost' JSON and you can process it as such using modules, after a simple cleanup. I will use JSON::XS. The code below takes off after $googleResult has been acquired. (In tests I actually copied data shown in the question into a file and read it in.) I first replace ' by " and lower-case True and False, to get a valid JSON format which the module can decode.

# Other code from the question ...
use JSON::XS;

# For tests I loaded shown data into $googleResult (did not run this eval)
eval {
    $googleResult = $googleapi->search($query, $prefs->get('max_search_items'));
};
if ($@) {
    $log->error("Not able to search All Access for \"$query\": $@");
    return;
}

# The structure shown in the question needs a cleanup
# But this may be a road to madness, if there is more
$googleResult =~ s/'/"/g;        # ' turn off wrong editor coloring
$googleResult =~ s/False/false/g;
$googleResult =~ s/True/true/g;

my $coder = JSON::XS->new;    
# There are many options for how to set it up. Example:
# JSON::XS->new->ascii->pretty->allow_nonref;    

my $data = $coder->decode($googleResult);  
# Now this is a normal Perl data structure that we can work with. 
# Look at what's under 'album_hits' for example
my $ralbhits = $data->{'album_hits'};  
print Dumper($ralbhits);
# We get: VAR1 = [ { 'album' => { albumID => ... } } ]
# Array reference, with nested hash references as the sole element

# Extract the 'artist'
my $artist = $ralbhits->[0]->{'album'}->{'artist'};
print "$artist\n";

This prints J. Cole (after the dump which I omit here). You can for convenience first extract a part of the structure and then query it far more simply. For example

# Get the hashref for album
my $ralbum = $ralbhits->[0]->{'album'};
my $artist = $ralbum->{'artist'};

Now once the data is unpacked you can retrieve what you need, based on what artist_to_slim_artist() needs and does. This is a normal data structure to work with.

Modules for JSON parsing return Perl data structures, see Mapping in JSON::XS. Generally they will be nested, except in very simple cases. For how to work with them see perldsc, a cookbook on complex data structures.


The JSON object given in this example, while invalid, needed very little correction. However, it may get far more complicated. For example, there is a far larger document linked to in a comment, with these problems.

  • Name-value pairs are enclosed in ' instead of " and the values themselves contain ' (like isn't and other contractions), complicating the matching and replacement of ' pairs.

  • Invalid u" sequences at the beginning of names or values (u removed)

  • Text may well contain all kinds of escapes, for example some encoding of accents, which are not valid JSON. (One in that document.) This can be found and fixed (escaped for example).

It took me a few minutes to come up with a few regex that corrected the document at the link, at close to 100kB in size, so that it parses cleanly with the above code. But the problem is that it is hard to tell what other trouble may be in the next document.

There are various modules that allow a 'relaxed' approach and will accept many such irregularities. However, by using them we agree to use an invalid JSON, which is meant to be a simple and clear data format, and I wouldn't advise to go down that road. Note that the nearly full specification of the format fits nicely in one clear and genereously illustrated page at the above link. Also see JSON Example, for a handful of examples.

I think that the best bet is to try to clean it up. Run a decoder like in the code above and see the error message. It will pinpoint the problem exactly. Then add a regex to correct that particular violation of format. Then go again. If the various documents you may work with carry more or less the same set of problems it may well work. Or it may turn out that it is too much trouble, if new violations keep coming up, in which case you may need a different approach.

Finally, I don't know how you arrived at this format from the original Python-object problem. It is possible that the format got broken somewhere in translation but I don't see how that would be the case, or that it is not meant to be JSON but it is too close to it for that.

Is it possible to ask for valid JSON to be provided?