KyBe - 11 months ago 73

Scala Question

I would like to have a fast implementation of Hamming distance on binary vectors.

I tested it on

`Array[Byte]`

`Array[Int]`

If someone can explain me this behaviour and/or advise me on a better implementation.

`def hammingDistanceI(v1:Array[Int], v2:Array[Int]) = {`

v1.zip(v2).count{case(a,b) => a!=b}

}

def hammingDistanceB(v1:Array[Byte], v2:Array[Byte]) = {

v1.zip(v2).count{case(a,b) => a!=b}

}

def speedMeasureByte(v:Array[Byte], nbIte:Int) = {

val t0 = System.nanoTime

for(i<-0 to nbIte-1) hammingDistanceB(v,v)

val t1 = System.nanoTime

(t1-t0)/1000000

}

def speedMeasureInt(v:Array[Int], nbIte:Int) = {

val t0 = System.nanoTime

for(i<-0 to nbIte-1) hammingDistanceI(v,v)

val t1 = System.nanoTime

(t1-t0)/1000000

}

val v1Int = Array.fill(100)(Random.nextInt(2))

val v1Byte = v1Int.map(_.toByte)

val (tInt, tByte) = (speedMeasureInt(v1Int,1000000),

speedMeasureByte(v1Byte,1000000))

// tInt = 1636 ms

// tByte = 3307 ms

Answer Source

I am not sure why byte implementation is slower than the other, but suspect it has to do with the way `!=`

is implemented - cpu registers are better equipped to deal with four-byte sequences nowadays than with single bytes.

The above is just my guess though, don't bet your house on it.

As for a faster implementation, if your use case is such, where single nanoseconds matter, you'll have to abandon the elegance of scala collections and stick with the old good loops:

```
def hd(a: Array[Int], b: Array[Int]) {
var c = 0
var i = 0
while(i < a.length) { c += a(i)^b(i); i+=1 }
c
}
```

This should be several hundred times faster on average than your implementation.