oucil oucil - 3 months ago 4
MySQL Question

Is there a formula to find shortest length required to achieve uniqueness across a set

I'd like to be able to calculate the length of the shortest sub-string required to achieve complete uniqueness.

Lets say I have a varying length list of 32 character UUIDs, but what I'd like to achieve is shortenening them during reference to only be as long as is required to achieve uniqueness in their set. For instance, if I have the following set of UUID's (pipes inserted to illustrate the answer)...


I would only require the first three characters to achieve the same level of uniqueness as if I had used the full 32 character strings.

I'm curious if there is a formula for reaching that value. I know that I could put this in a couple nested loops, but I'd like to know if there is a more elegant or programmatic way of achieving this.

Edit: Just to be clear, the pipes are only to illustrate that I can achieve uniqueness after only 3 characters. The result of the formula/method should be an array of equal length with only the shortest strings derived from the given set, in this case, the first three chars only. Imagine that I want to use these in a URL, and that I can't have any ambiguity, but still want to be able to reference the same records as if I used the full string in each case.

EDIT2: Actually... as I think about it, no need for a result array, only an integer, the min length required in characters.


I managed to create some codes to achieve that. Take a look:

  • Code 1:
function check_un($array){
    $array = array_values($array);
    $len = 1;
    $tmp = array();
    for($i = 0; $i < strlen($array[0]); $i++){
        if( count(array_unique( $tmp = array_map(function($v) use($len){ return substr($v, 0, $len); }, $array ) )) != count($array) ){
    return $tmp; // this was set in the array_map part
  • Code 2:
function check_un($array){
    $arr = $array;
    $len = 1;

    $tmp = array();

    while (list($key, $value) = each($arr)) {
        $v = substr($value, 0, $len);
        if (isset($tmp[$v])) {
            $tmp = array();
            reset($arr); // start again
        $tmp[$v] = true;
    $tmp = array_keys($tmp);
    return $tmp;

There used to be a code 3 (the first I tried), but it's only available in the edit history.

You can test them with this:

$values = array(
    //,'42807082e1f445e795aaaaaaaaaaaaa' // add this to test with more letters

$val = check_un($values);

The result (for both cases):

    [0] => 428
    [1] => 723
    [2] => b65
    [3] => 3c4
    [4] => 09e
    [5] => 011
    [6] => 1f1
    [7] => 1ac
    [8] => 42f
    [9] => 83a

See them in action here:

You can change the returned value to get only the $len variable.