xybrek xybrek - 22 days ago 7
Java Question

Cache GAE Entity to Memcache

This is my code to put Entities to the GAE Datastore:

public Key put(Object object) {
Key result = null;
Iterable<Entity> entities = marshall(object);
List<Key> keys = _ds.put(entities);
if(isCached(object)){ // check if class is annotated with @Cached
// TODO: How should I cache the Entity before put or after put?
}
assert list(entities).size() == keys.size();
result = Iterables.getLast(keys);
return result;
}



  • I want to know in what ways I can do caching for this?

  • Should the whole Entity be cached or just fields?

  • Should it be cached before or after the Datastore put?

  • Is there a default cache expiration, or it should be explicitly defined?



And here is my code for get Entities to the Datastore:

public <T> T get(Class<T> clazz, String key) {
T result = null;
try {
String kind = getKind(clazz);
if(isCached(clazz)){ // check if class is annotated with @Cached
// TODO: How should I get cache?
}
Entity e = _ds.get(KeyStructure.createKey(kind, key));
result = createInstance(clazz);
unmarshaller().unmarshall(result, e);
} catch (EntityNotFoundException e1) {
e1.printStackTrace();
}
return result;
}



  • BTW, a side question is it good to catch Datastore get() request or just let the app that uses this code handle the exception, what is the best practice?


Answer

I want to know in what ways I can do caching for this?

Caching is a complicated subject, so please do not take this as absolute truth. For this simple put / get pair you would want to set the cache during put and check the cache (applying the memcache pattern) during get.

This method is called a write-through cache, meaning that both the cache and the persistent storage are updated before confirming the operation.

Should the whole Entity be cached or just fields?

Typically an entity only consists of its serializable fields so I typically cache the whole thing.

Should it be cached before or after the Datastore put?

Typically after, as you don't want the cache to return something that wasn't successfully committed to the datastore. This way, if either the call to datastore or memcache fails the get operation will still return the correct state of the datastore.

Is there a default cache expiration, or it should be explicitly defined?

Not specifying a time-to-live (TTL) or specifying a TTL of 0 means that memcache won't drop the key unless it's under memory pressure (which can happen quite a lot). Setting a TTL means that memcache will keep the data around at most that many seconds. Nothing will guarantee that the data will be there - you should always treat memcache as an unreliable store.

P.S.: Objectify can perform this simple caching for you automatically. It's a fantastic library and I highly recommend it over using raw datastore.