cresjoy cresjoy - 4 months ago 14
Java Question

Matching names with corresponding 32 bit unsigned integers

Consider the following problem http://potw.quinnftw.com/problem/2015/1/


Will has been feeling pretty lonely lately. In order to help himself he creates a dating website. Being a computer scientist, he comes up with an interesting way of computing how well one person would match with another.

Based on the information entered, Will creates a 32 bit unsigned integer which acts as a mask for the person's data. In order to compute compatibility between a couple Will checks how many similar bits they have. The more similarity the better the match. One also has to keep in mind though, opposites attract.

Your goal is to design an algorithm that displays the couple with the highest compatibility level, and the lowest compatibility level.


The problem in summary is to type in some integer
N
, and then after type in
N
amount of
names
with corresponding 32 bit unsigned integers.

The goal is to design an algorithm that displays the couple with the highest compatibility level, and the lowest compatibility level.

You can view the details in the link. It isn't too much of a read.

Consider the sample input

5
Danniel 116077289
Bowman 2316887705
Cleveland 2186347654
Marinda 2662238983
Viviana 3393681530


And the output

Bowman Cleveland
Marinda Viviana


I'm having a bit of a hard time understanding the problem. Is it telling me to match based on occurrences of numbers? Basically am I tracking how many integers in one int is the same in another names int? Is the output simply achieved by taking the pair of names with the most common numbers? and then the second pair with the least common numbers?

Any insight how I can understand this problem would be nice. My impression of it so far is to simply match two names with the most occurrences of a number. However the 32 bit part is confusing me, I'm not sure what this part means.

I would like to do this with javascript or java, but I don't think coding it will be too much of an issue once I understand exactly what the question is asking.

Thank you!

Consider one of the solutions here http://potw.quinnftw.com/solution/17/

The part I am having trouble understanding is this for loop

public static int getScore(long x, long y) {
int score = 0;
for (long i = x ^ y; i > 0; i = i >> 1)
if (i-1 == (i & (i-1)))
score++;
return 32 - score;
}


I`m confused about why I am using xOR here and diving by 2 each time I iterate the loop.

Answer

If you read the problem carefully, this is a question about "how many identical bits do two 32 bit numbers have). You have to examine the number in binary form (where you can look at each bit) and compare bits between the numbers that represent each person's characteristics.

So, one person's number might be represented in bary as

01001101110011010011010100101111 

and another as

01111100010011010011010100010111 

To see how well they match, you have to see how many bits have the same value where the bit in the same corresponding position is both 0 or both 1. Add one to the matching score for each bit that has the same value.

Since this is some sort of learning exercise, I won't just hand you the code, but will point you to a reference on manipulating bits in Javascript.


As it turns out, there are multiple phases to this problem:

  1. Compare the mask number for two people and generate a match score.
  2. For all possible user combinations, compute all match scores.
  3. Find the highest match score, add them to the final results, remove them from all the match scores you have so far and find the next highest match. Repeat until there are 0 or 1 people left.

Output from my solution (which has not yet been linked so the OP can work the problem themselves):

Visually in Binary
00000110111010110011001011101001 Danniel
10001010000110001110011010011001 Bowman
10000010010100010000010010000110 Cleveland
10011110101011101000101100000111 Marinda
11001010010001110111100001111010 Viviana

All Comparisons (sorted)
Bowman,Cleveland:19
Danniel,Viviana:17
Danniel,Bowman:16
Cleveland,Viviana:16
Danniel,Cleveland:15
Danniel,Marinda:15
Bowman,Marinda:15
Bowman,Viviana:15
Cleveland,Marinda:14
Marinda,Viviana:12

Best Matches
Bowman,Cleveland: 19
Danniel,Viviana: 17

Here's a hint for comparing if a given bit is the same in two numbers:

var bitmask = 1;      // start with lowest bit
var num1 = 116077289;
var num2 = 2316887705;
if ((num1 & bitmask) === (num2 & bitmask)) {
    // the bit represented by bitmask is the same in both numbers
}

Next hint: If bitmask is 2, then you're comparing the second bit. If bitmask is 4, you're comparing the third bit, 8 then the fourth bit, 16, the fifth bit and so on up to the 32nd bit. Add a counter and you can count the bits that match.


Here's a working solution:

// function used to make display prettier
function zeroPadLeft(str, len) {
    while (str.length < len) {
        str = "0" + str;
    }
    return str;
}

function compareBits(num1, num2) {
    // score is the number of matching bits
    var score = 0;
    // start with first bit
    var mask = 1;
    // create rotating mask to individually compare each of the lowest 32 bits
    for (var i = 0; i < 32; i++) {
        // if this bit has the same value, increase the score
        if ((num1 & mask) === (num2 & mask)) {
            ++score;
        }
        // advance mask to next bit with shift left operator
        mask = mask << 1;
    }
    return score;
}

// input data
var data = [
    {name:"Danniel", value:116077289}, 
    {name:"Bowman", value:2316887705}, 
    {name:"Cleveland", value:2186347654}, 
    {name:"Marinda", value:2662238982}, 
    {name:"Viviana", value:3393681530}
];

// show the starting data in binary so we can see a visual representation of the actual bits
log("<b>Visually in Binary</b>");
data.forEach(function (item) {
    log(zeroPadLeft(item.value.toString(2), 32) + "  " + item.name);
});

// record the score of all possible combinations in the scores array of objects
log("<hr>");
log("<b>All Comparisons</b>");
var scores = [];
for (var j = 0; j < data.length; j++) {
    for (var k = j + 1; k < data.length; k++) {
        var score = compareBits(data[j].value, data[k].value);
        // record the score and two names as an object inserted into an array
        scores.push({
            name1: data[j].name,
            name2: data[k].name,
            score: score
        })
    }
}

// sort by best score to make it easier to find the highest score
scores.sort(function (a, b) {
    return b.score - a.score;
});
// output sorted scores so we can see them visually
scores.forEach(function (item) {
    log(item.name1 + "," + item.name2 + ":" + item.score);
});

// now find the top scores with no person repeated
log("<hr>");
log("<b>Best Matches</b>");
// namesUsed keeps track of which names have already found a high score so we don't use them again
var namesUsed = {};
while (scores.length > 0) {
    var bestItem = scores.shift();
    // if either of these names has already been used, then skip this score
    if (namesUsed[bestItem.name1] || namesUsed[bestItem.name2]) {
        continue;
    }
    log(bestItem.name1 + "," + bestItem.name2 + ": " + bestItem.score);
    namesUsed[bestItem.name1] = true;
    namesUsed[bestItem.name2] = true;
}
body {font-family: "Courier New";}
<script src="https://dl.dropboxusercontent.com/u/7909102/log.js"></script>

Explanation of comparing bits

The key part is counting the bits that are the same in two numbers:

function compareBits(num1, num2) {
    // score is the number of matching bits
    var score = 0;
    // start with first bit
    var mask = 1;
    // create rotating mask to individually compare each of the lowest 32 bits
    for (var i = 0; i < 32; i++) {
        // if this bit has the same value, increase the score
        if ((num1 & mask) === (num2 & mask)) {
            ++score;
        }
        // advance mask to next bit with shift left operator
        mask = mask << 1;
    }
    return score;
}

This is what I think is the simplest to understand implementation (not the fastest). Basically what it does is define a mask number with an initial value of 1. When we logically AND the mask number with each of our values, we isolate a single bit in each number. We can then compare that remaining single bit to see if they are equal. If so, increase our score. Then, shift the mask left by one place so we can look at the next bit. Repeat that 32 times and we've compared each of the 32 bits, counting how many have the same value.


If you want to see how obtuse bit manipulation can get, here's a pretty fast algorithm called the Hamming weight implementation:

function countSimilarBitsHamming(num1, num2) {
    // xor sets a bit to 0 if both are the same and 1 if different
    // so if we xor and then negate, we get bits that are the same
    var sameBits = ((~(num1 ^ num2)) & 0xFFFFFFFF) >>> 0;
    sameBits = sameBits - ((sameBits >> 1) & 0x55555555);
    sameBits = (sameBits & 0x33333333) + ((sameBits >> 2) & 0x33333333);
    return (((sameBits + (sameBits >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

Here's an even slower implementation, but the bit counting part is a simple string count:

function countBitsString(num1, num2) {
    // xor sets a bit to 0 if both are the same and 1 if different
    // so if we xor and then negate, we get bits that are the same
    var sameBits = ((~(num1 ^ num2)) & 0xFFFFFFFF) >>> 0;
    var str = sameBits.toString(2).replace(/0/g, "");
    return str.length;
}

Steps:

  1. num1 XOR num2 - This generates a result where a bit is set to 0 if both corresponding bits in num1 and num2 had the same value or 1 if they were different values.
  2. That XOR value is the inverse of what we want so we negate it with the ~ operator. Now we have a 1 wherever the two numbers had equal bits.
  3. Because there are more than 32 bits in the negated value, we and it with the max 32-bit value to zero out the higher bits.
  4. Then because Javascript only deals with signed values, but we want unsigned values, we use a trick of doing a zero fill right shift, but with a 0 shift value >>> 0 and this will clear the sign bit.
  5. Now there is a list of matching bits.
  6. Then, a really simple way to count them is to convert to string represeantation of binary and remove the zeroes. All that will be left is the 1s in the string so just see how long the string is and that's how many 1s where in the string.
Comments