Mike Lavender - 1 year ago 106

Scala Question

I am curious as to how I can write a generic method to calculate standard deviation and variance in scala. I have a generic method for calculating mean (stolen from here: Writing a generic mean function in Scala)

I have tried to convert the mean calculation to get standard deviation and variance but it looks wrong to me. Generics are WAY beyond my skills in Scala programming at the moment.

The code for calculating the mean, standard deviation and variance is this:

`package ca.mikelavender`

import scala.math.{Fractional, Integral, Numeric, _}

package object genericstats {

def stdDev[T: Numeric](xs: Iterable[T]): Double = sqrt(variance(xs))

def variance[T: Numeric](xs: Iterable[T]): Double = implicitly[Numeric[T]] match {

case num: Fractional[_] => {

val avg = mean(xs)

num.toDouble(

xs.foldLeft(num.zero)((b, a) =>

num.plus(b, num.times(num.minus(a, avg), num.minus(a, avg))))) /

xs.size

}

case num: Integral[_] => {

val avg = mean(xs)

num.toDouble(

xs.foldLeft(num.zero)((b, a) =>

num.plus(b, num.times(num.minus(a, avg), num.minus(a, avg))))) /

xs.size

}

}

/**

* http://stackoverflow.com/questions/6188990/writing-a-generic-mean-function-in-scala

*/

def mean[T: Numeric](xs: Iterable[T]): T = implicitly[Numeric[T]] match {

case num: Fractional[_] => import num._; xs.sum / fromInt(xs.size)

case num: Integral[_] => import num._; xs.sum / fromInt(xs.size)

case _ => sys.error("Undivisable numeric!")

}

}

I feel like the match case in the variance method is not needed or could be more elegant. That is, the duplicity of the code seems very wrong to me and that I should be able to just use the match to get the numeric type and then pass that on to a single block of code that does the calculation.

The other thing I don't like is that it always returns a

`Double`

So, are there any suggestions on how to improve the code and make it prettier?

Answer Source

The goal of a type class like `Numeric`

is to provide a set of operations for a type so that you can write code that works generically on any types that have an instance of the type class. `Numeric`

provides one set of operations, and its subclasses `Integral`

and `Fractional`

additionally provide more specific ones (but they also characterize fewer types). If you don't need these more specific operations, you can simply work at the level of `Numeric`

, but unfortunately in this case you do.

Let's start with `mean`

. The problem here is that division means different things for integral and fractional types, and isn't provided at all for types that are only `Numeric`

. The answer you've linked from Daniel gets around this issue by dispatching on the runtime type of the `Numeric`

instance, and just crashing (at runtime) if the instance isn't either a `Fractional`

or `Integral`

.

I'm going to disagree with Daniel (or at least Daniel five years ago) and say this isn't really a great approach—it's both papering over a real difference and throwing out a lot of type safety at the same time. There are three better solutions in my view.

You might decide that taking the mean isn't meaningful for integral types, since integral division loses the fractional part of the result, and only provide it for fractional types:

```
def mean[T: Fractional](xs: Iterable[T]): T = {
val T = implicitly[Fractional[T]]
T.div(xs.sum, T.fromInt(xs.size))
}
```

Or with the nice implicit syntax:

```
def mean[T: Fractional](xs: Iterable[T]): T = {
val T = implicitly[Fractional[T]]
import T._
xs.sum / T.fromInt(xs.size)
}
```

One last syntactic point: if I find I have to write `implicitly[SomeTypeClass[A]]`

to get a reference to a type class instance, I tend to desugar the context bound (the `[A: SomeTypeClass]`

part) to clean things up a bit:

```
def mean[T](xs: Iterable[T])(implicit T: Fractional[T]): T =
T.div(xs.sum, T.fromInt(xs.size))
```

This is entirely a matter of taste, though.

You could also make `mean`

return a concrete fractional type like `Double`

, and simply convert the `Numeric`

values to that type before performing the operation:

```
def mean[T](xs: Iterable[T])(implicit T: Numeric[T]): Double =
T.toDouble(xs.sum) / xs.size
```

Or, equivalently but with the `toDouble`

syntax for `Numeric`

:

```
import Numeric.Implicits._
def mean[T: Numeric](xs: Iterable[T]): Double = xs.sum.toDouble / xs.size
```

This provides correct results for both integral and fractional types (up to the precision of `Double`

), but at the expense of making your operation less generic.

Lastly you could create a new type class that provides a shared division operation for `Fractional`

and `Integral`

:

```
trait Divisible[T] {
def div(x: T, y: T): T
}
object Divisible {
implicit def divisibleFromIntegral[T](implicit T: Integral[T]): Divisible[T] =
new Divisible[T] {
def div(x: T, y: T): T = T.quot(x, y)
}
implicit def divisibleFromFractional[T](implicit T: Fractional[T]): Divisible[T] =
new Divisible[T] {
def div(x: T, y: T): T = T.div(x, y)
}
}
```

And then:

```
def mean[T: Numeric: Divisible](xs: Iterable[T]): T =
implicitly[Divisible[T]].div(xs.sum, implicitly[Numeric[T]].fromInt(xs.size))
```

This is essentially a more principled version of the original `mean`

—instead of dispatching on subtype at runtime, you're characterizing the subtypes with a new type class. There's more code, but no possibility of runtime errors (unless of course `xs`

is empty, etc., but that's an orthogonal problem that all of these approaches run into).

Of these three approaches, I'd probably choose the second, which in your case seems especially appropriate since your `variance`

and `stdDev`

already return `Double`

. In that case the entire thing would look like this:

```
import Numeric.Implicits._
def mean[T: Numeric](xs: Iterable[T]): Double = xs.sum.toDouble / xs.size
def variance[T: Numeric](xs: Iterable[T]): Double = {
val avg = mean(xs)
xs.map(_.toDouble).map(a => math.pow(a - avg, 2)).sum / xs.size
}
def stdDev[T: Numeric](xs: Iterable[T]): Double = math.sqrt(variance(xs))
```

…and you're done.

In real code I'd probably look at a library like Spire instead of using the standard library's type classes, though.