KyBe KyBe - 1 month ago 16
Scala Question

Hamming Distance on binary Vector in Scala

I would like to have a fast implementation of Hamming distance on binary vectors.
I tested it on

rather than
thinking that it would be faster but it's not the case.
If someone can explain me this behaviour and/or advise me on a better implementation.

def hammingDistanceI(v1:Array[Int], v2:Array[Int]) = {{case(a,b) => a!=b}
def hammingDistanceB(v1:Array[Byte], v2:Array[Byte]) = {{case(a,b) => a!=b}

def speedMeasureByte(v:Array[Byte], nbIte:Int) = {
val t0 = System.nanoTime
for(i<-0 to nbIte-1) hammingDistanceB(v,v)
val t1 = System.nanoTime

def speedMeasureInt(v:Array[Int], nbIte:Int) = {
val t0 = System.nanoTime
for(i<-0 to nbIte-1) hammingDistanceI(v,v)
val t1 = System.nanoTime

val v1Int = Array.fill(100)(Random.nextInt(2))
val v1Byte =

val (tInt, tByte) = (speedMeasureInt(v1Int,1000000),

// tInt = 1636 ms
// tByte = 3307 ms


I am not sure why byte implementation is slower than the other, but suspect it has to do with the way != is implemented - cpu registers are better equipped to deal with four-byte sequences nowadays than with single bytes.

The above is just my guess though, don't bet your house on it.

As for a faster implementation, if your use case is such, where single nanoseconds matter, you'll have to abandon the elegance of scala collections and stick with the old good loops:

 def hd(a: Array[Int], b: Array[Int]) { 
   var c = 0
   var i = 0
   while(i < a.length) { c += a(i)^b(i); i+=1 }

This should be several hundred times faster on average than your implementation.