view raw
Ignat Ignat - 6 months ago 38
Javascript Question

How do I use Etags for Youtube v3 Data API?

I am building an extension and it makes a lot of requests. The feature I'm working on is to display the total length of time it would take to watch a playlist. Given a playlist of size 1000, I have to make 40 requests just to find this information (50 videos at a time limit, 1st call to /v3/playlistItems for an array of videoID's, 2nd call to /v3/videos for duration information). As far as I can tell, just for that one playlist, I lose 600 quota. Per load of page. I know, nothing to get worked up about because I have 50,000,000 quota per day allowed, but I want to optimize early. This is also a speed issue. It takes a solid minute and a half just to get the playlist length.

Now, ETags. For some reason, every time I make a request to youtube's data API for videos or playlistItems I get a completely new Etag (Most of the time, I have had cases where it returns the same ETag), regardless of playlist (I haven't tried private playlists, did not do OAuth yet). I'm assuming that the reason is that something is changing somewhere in a playlist, causing a new Etag very fast. Views? PlaylistItems doesn't even return views!

Here are example API calls to a macaroni playlist. The ETags are always different! How am I supposed to use them if they don't work? They are specific, there is no way that the length of videos changes in between requests.
The api key is omitted because you can make your own api key.

Playlist Items, give me video id's, page tokens, and Etag for playlist for items 100-150{YOUR_API_KEY}&pageToken=CGQQAA

Videos, give me durations and Etag for these video ids,3Hy5BuFTBbI,ZnlW1fSXZZM,8sb_YOrReZ4,6IN_mupBjh8,VzoqsRLY5Qk,5m8H9YrPvPA,JdRbtGdR68E,hEzPBiYPsDU,bJuioKFYv-c,1N8O8OOG2_U,QDgqSL8nU5U,gP4gB45Z52M,pI1oB2y9c0M,WZGn5Vh_mc4,A0KpbS5WjSU,b0yoIOX8Bk0,5Y7iQt7vtOE,qIijCwjUApQ,RgHjqvznjxg,QzceROWtn5o,8z0VnMQFGR8,5olHoTWB1Hw,vz0T59Ql7fQ,LhktiZYQraU,WIuuZOD9ahI,rwEHW6GRH1Q,FjT1BpKvfgo,FRZL2yaZyZk,U5-vjCDwDUU,b21Lj9bfDWc,yox3-U7r_i8,rXJ5ph83Vrs,nXrk2finMcA,VfagTkQWHuI,K_ZaRAtZQOg,_JIcREsn9pU,y9WGvudeDAM,O08jNtrieI4,9UkEzW1AY7Y,jOaBdnYsobg,y7dSbhc-8h0,IfpPiCGcF8g,2rTRmb9nKbY,bHgv3A26O6Y,hFQmV-zvcbM,Osc4y45oQxw,GHusS6Yd5A8,T2Z3CuUWUQc,OPV-DopMqxs&fields=etag%2Citems%2FcontentDetails%2Fduration&key={YOUR_API_KEY}

I want to cache this data. I'm thinking of making an extra beginning request for the playlist's total videos, because that is something that is directly correlated to the total length of time for a playlist. But that feels like a lot of logic. What video was added/removed? How many? If it was added to the beginning, I imagine to optimize, I have to compare the first 50 video id's with my cached video id durations. If it was changed somewhere in the middle I have to keep querying. Maybe cache something else to make this easier? Multiple playlists can have the same videos, playlists can have the same video more than once, I dunno. Maybe there is no way around querying an entire playlist, maybe I should just cache the calls to /v3/videos. The thing is that I want to optimize the the call to /v3/playlistItems because is the long one (Takes 3x the time to /v3/videos).

My main questions are: What do I cache to optimize getting playlist length, How do I do that, and what's up with the ETags?


I figured out how to cache the data a while ago, sorry!

You can make a call to /playlists to get both the total count of items in a playlist, plus the etag changes if and only if the playlist itself changed, which is what I want. I only want to make new requests if the base playlist changed.

A call to /playlistItems always generates a new etag, regardless of changes. I think this endpoint is meant for temporary querying to figure out metadata of a video as it relates to a playlist, not for static data lookup. Playlists are very felxible and I think YouTube decided against caching this data since calls to /playlistItems are often on a case-by-case basis. It's likely their backend automatically generates an etag, but doesn't actually save anything for this endpoint.

So, these are the steps to get the total length of time of a playlist, plus caching:

  1. get playlist id
  2. lookup etag in cache by playlist id
  3. call /playlists with the etag in the If-None-Match header (should work even if etag is empty)
    • if the api returns 304, use cached playlist length
    • if the api returns 200, save the new etag in cache
    • You can do more caching!
  4. call /playlistItems with playlist id (with all the pageTokens)
  5. lookup each videoId in cache to get video length
    • Cache is defined as a dictionary of videoId:videoLength
    • if videoLength not found, add videoId to a videos array
    • if videoLength is found, add to a lengths array
  6. call /videos with all the video id's that are not found in cache up to 50 elements
    • Could be done right after /playlistItems call or when all calls are done, I think it's ok to be lazy right now and do it right after each call
    • Also you can cache video calls with etags and save that to check if the length hasn't changed, but then you would have to call the api per each video. I dunno, but I think this is over-optimizing. Still might want to keep in mind that video length can change via YouTube's editing tools when debugging
  7. (continued from 7) For each video in the response, cache the video length in a dictionary as a videoId:videoLength pair, then add length to a lengths array
  8. Reduce lengths array into a moment.js duration object
  9. Save a formatted string of the length of the playlist to cache by etag as key
  10. Return the formatted string of the length of the playlist

Here is the implementation on my github