Matt S. Matt S. - 6 months ago 28
Python Question

Python's ChainMap for Java?

I have a deeply nested configuration hassle.

The problem happens to be in machine learning, where an end-user calling a cross-validation routine, may, or may not specify any of various parameters (e.g. "randomSeed" = 17)

Either way, the parameters then have to be passed first to the cross-validation algorithm, and then on to a first machine learning algorithm. The machine learning algorithm, must be able to set and pass on other parameters, all without the initial user knowing.

Most all of the consumers in the chain of parameter users expect a java Map interface to be doing the look-up from.

Flattening the keys into one library is unattractive for performance reasons -- both CPU and memory -- (the 'root key-name' space) will be used without modification many thousands of times, and each time a number of additional parameters need to be specified before the bundle is passed along.

A decent analog is how the PATH variable works, each element in the path being a directory (key-namespace). When a query is made against the PATH variable (eg. you type 'emacs' at the command line), it looks in each directory (unnamed namespace of keys) for that file-name (specified value) in order, until it either finds it, or fails to find it. If it finds it, you get to execute the specific contents of the executable file it found (get the value of the parameter set). If you have a PATH variable from another, you can append a new directory (anonymous key-space ) in front of it as you pass that PATH variable setting along to a new end-user, without modifying the previous directories (preferences).

Given the name-space on the configuration parameters is effectively flat, a solution like Python's ChainMap would be perfect (eg example usage) but I'm finding no equivalent solution in Java?

Answer

Over the weekend I went ahead and created a ChainMap implementation as well; thanks to Java 8 it's a surprisingly small class. My implementation is slightly different than yours; it doesn't attempt to mirror Python's behavior and instead follows the Map interface's specifications. Notably:

  • Lookup order is insertion order; the first map passed to the constructor takes precedence over the following maps.
  • .containsValue() doesn't match values that are masked by earlier maps.
  • .put() returns the previous value of the chain map, even if that value was in a later map.
  • .remove() removes the key from all maps, not just the first map or the visible entry. From the Javadoc: "The map will not contain a mapping for the specified key once the call returns."
  • Similarly .clear() clears all maps, not just the top map.
  • Implements .equals() and .hashCode() on the basis of its entry set, so that it is equal to other Map implementations.

I also did not implement push/pop behavior as it felt like an anti-pattern; ChainMap is already an O(1) view into a series of maps, you can simply construct additional ChainMaps with the maps you want as needed.

Obviously, if your implementation works for your use case, that's great. But it violates the Map contract in several places; I'd strongly suggest removing implements Map<K, V> and just let it be a standalone class.

Many of the class's methods are nice one-liners, e.g.:

@Override
public int size() {
  return keySet().size();
}

@Override
public boolean isEmpty() {
  return !chain.stream().filter(map -> !map.isEmpty()).findFirst().isPresent();
}

@Override
public boolean containsKey(Object key) {
  return chain.stream().filter(map -> map.containsKey(key)).findFirst().isPresent();
}

@Override
public boolean containsValue(Object value) {
  return entrySet().stream()
    .filter(e -> value == e.getValue() || (value != null && value.equals(e.getValue())))
    .findFirst().isPresent();
}

@Override
public V get(Object key) {
  return chain.stream().filter(map -> map.containsKey(key))
    .findFirst().map(map -> map.get(key)).orElse(null);
}

I've written some tests to verify the class's behavior as well. Additional test cases are welcome.


I also extended your idea of using Maps.asMap() to create an immutable view of a collection of maps; if you don't need mutation this will work nicely. (As I learned, you have to use the three-argument form of .reduce() to get the generics to behave).

public static <K, V> Map<K, V> immutableChainView(
    Iterable<? extends Map<? extends K, ? extends V>> maps) {
  return StreamSupport.stream(maps.spliterator(), false).reduce(
    (Map<K,V>)ImmutableMap.<K,V>of(),
    (a, b) -> Maps.asMap(Sets.union(a.keySet(), b.keySet()),
                         k -> a.containsKey(k) ? a.get(k) : b.get(k)),
    (a, b) -> Maps.asMap(Sets.union(a.keySet(), b.keySet()),
                         k -> a.containsKey(k) ? a.get(k) : b.get(k)));
    }