Dac Saunders - 5 months ago 42

Python Question

I basically want to know how GAE has implemented its indexing, I'm familiar with indexing such as B+ trees and wonder if e.g. the filter() method uses a B+ tree for its implementation? Can I see this implementation in the appengine code for the SDK since it is Open Source? Is the get() and get_by_id() functions implemented using hashing to make it O(1)`Is the filter function O(log(n)) since one might think it uses a B+ tree where lookups are O(log(n))?

Thanks for any insight

Answer

the long answer is my talk from a few years ago: Under the Covers of the Google App Engine Datastore. you'll also be interested in the Bigtable paper and Megastore paper.

the short answer is, query filtering is implemented via both single-property and multi-property (aka composite) indices. the query planner picks one or more to scan and merge densely, ie all or almost all scanned index rows correspond to a valid result for the query. details in my talk.

get() is implemented as a bigtable single row lookup. it's somewhere between O(1) and O(log(n)), with small constant factors since the log(n) part usually happens entirely in memory. details in the bigtable paper.