Giovanni Azua Giovanni Azua - 27 days ago 5
Scala Question

Saddle Frame: What's the most idiomatic way to count NaN values?

I build a Scala Frame like so e.g.

import org.saddle._
import scala.util.Random

val rowIx = Index(0 until 200)
val colIx = Index(0 until 100)

// create example having 15% of NaNs
val nanPerc = 0.15
val nanLength = math.round(nanPerc*rowIx.length*colIx.length).toInt
val nanInd = Random.shuffle(0 until rowIx.length*colIx.length).take(nanLength)
val rawMat = mat.rand(rowIx.length, colIx.length)
// contents gives a single array in row major
val rawMatContents = rawMat.contents
nanInd foreach { i => rawMatContents.update(i, Double.NaN) }

val df = Frame(rawMat, rowIx, colIx)

// now I'd like to test that the number of NaNs is correct but
// most functions for this purpose in Frame e.g. countif exclude NaNs
df.???


What's the most idiomatic (Scala, Saddle) way to count the number of NaNs?

Answer Source

Frame.countif is implemented as:

def countif(test: T => Boolean)(implicit ev: S2Stats): Series[CX, Int] = frame.reduce(_.countif(test))

while Vec.countif is implemented as:

def countif(test: Double => Boolean): Int = r.filterFoldLeft(t => sd.notMissing(t) && test(t))(0)((a,b) => a + 1)

We can use the same but remove test and invert the NaN check:

vec.filterFoldLeft(x => x.isNaN)(0)((a, b) => a + 1)

To run this on a Frame:

frame.reduce(_.filterFoldLeft(x => x.isNaN)(0)((a, b) => a + 1))