romellem romellem - 15 days ago 9
PHP Question

Why does this random string generator perform so poorly?

I found this bit of PHP code for generating random strings (alphabetical, alphanumeric, numeric, and hexadecimal).

<?php
function random($length = 8, $seeds = 'alpha') {
// Possible seeds
$seedings['alpha'] = 'abcdefghijklmnopqrstuvwqyz';
$seedings['numeric'] = '0123456789';
$seedings['alphanum'] = 'abcdefghijklmnopqrstuvwqyz0123456789';
$seedings['hexidec'] = '0123456789abcdef';

// Choose seed
if (isset($seedings[$seeds])) {
$seeds = $seedings[$seeds];
}

// Seed generator
list($usec, $sec) = explode(' ', microtime());
$seed = (float) $sec + ((float) $usec * 100000);
mt_srand($seed);

// Generate
$str = '';
$seeds_count = strlen($seeds);

for ($i = 0; $length > $i; $i++) {
$str .= $seeds{mt_rand(0, $seeds_count - 1)};
}

return $str;
}
?>


If I run this function with the default arguments (so it is generating 8 character strings, alphabetical only) and generate 1,000,000 strings, I'd think my collision rate would be low:

26^8 = 208,827,064,576
1,000,000 / 208,827,064,576 ~= 0.0004%


In actuality, when I run that on my machine, I get a 90% collision rate! Only 10% of my generated strings are unique.

Actually, it is suspiciously close to 10%. Generating multiple sets of 1,000,000 random strings, I find that each set generates...


  1. 100,032 unique strings

  2. 100,035 unique strings

  3. 100,032 unique strings

  4. 100,028 unique strings

  5. 100,030 unique strings

  6. you get the idea...



So what gives? Obviously it has to do with how I'm seeding
mt_srand
, or how php implements
mt_rand
, or something else.

So...

Why doesn't this code generate useful random strings?



And what would be a better approach?


Answer

Don't set the seed unless you know what you're doing, from the manual:

Note: There is no need to seed the random number generator with srand() or mt_srand() as this is done automatically.

The following code gets me almost a set of 100% unique strings

<?php
  function random($length = 8, $charset = 'alpha'){
    $list = [
      'alpha' => 'abcdefghijklmnopqrstuvwqyz',
      'numeric' => '0123456789',
      'alphanum' => 'abcdefghijklmnopqrstuvwqyz0123456789',
      'hexidec' => '0123456789abcdef'
    ];

    if(!isset($list[$charset])){
      trigger_error("Invalid charset '$charset', allowed sets: '".implode(', ', array_keys($list))."'", E_USER_NOTICE);
      $charset = 'alpha';
    }

    $str   = '';
    $max   = strlen($list[$charset]) - 1;

    for ($i = 0; $length > $i; $i++) {
      $str .= $list[$charset][mt_rand(0, $max)];
    }

    return $str;
  }

  $loop = 1000000;

  for($i=0;$i<$loop;$i++){
    $arr[random()] = true;
  }

  echo $loop - count($arr), " dupes found in list.";
?>
Comments