billc.cn billc.cn - 2 months ago 14
Java Question

Java library function to turn arbitrary String into XML ID

I think this must exist somewhere, but it's very difficult to search.

Library can be JDK, Guava, Commons-lang, xml processing library or any reasonably well-known library.

The behavior can be stripping or escaping, but for a bunch of unique, human-readable names with no special characters, the escaping result should also be unique and reasonably human-readable.

Thanks.

Answer

You most probably do not want to escape the string (which is generally reversible), and instead want to "sanitize" the string (retain only a subset of its original characters, those that are safe, possibly making it impossible to recover the original string). As you mentioned in comments, IDs can be quite picky.

So we choose a safe range and remove anything outside of that. Additionally, if it starts with a non-letter, we prepend an 'i' to make it compliant.

public String toSafeId(String s) {
     s = s.replaceAll("[^a-zA-Z0-9]+", "-"); // replaces runs of non-valid by '-'
     return s.length() > 0 && Character.isLetter(s.charAt(0)) ? s : "i" + s;
}

Note that this does not enforce uniqueness. To enforce it, wrap it up with a Set:

public class XmlIdGenerator {
    private HashSet<String> used;

    // provides a unique ID
    public String generate(String s) {
        String base = toSafeId(s);
        String id = base;
        for (int i = 1; used.contains(id); i++) {
            id = base + "-" + i;
        }
        used.add(id);
        return id;
    }
}

Use as:

XmlIdGenerator gen = new XmlIdGenerator(); // build a new one for each document
String oneId = gen.generate("   hi there sally!");      // -> "hi-there-sally"
String anotherId = gen.generate(" hi there.. sally?");  // -> "hi-there-sally-1"
Comments