dave - 10 months ago 100

Scala Question

I'm looking for a good open source library for scala for math and statistics. Hopefully something like Apache Math or Colt, but implemented in Scala.

Can anyone point me in the right direction?

Answer Source

Yes, there are some:

The ScalaLab project aims to provide an efficient scientific programming environment for the Java Virtual Machine. The scripting language is based on the Scala programming language enhanced with high level scientific operators and with an integrated environment that provides a Matlab-like working style.

The scripting code is extremely fast, close to Java (sometimes slower, sometimes faster), and usually faster from equivalent Matlab .m scripts!

A high performance numeric linear algebra library for Scala, with rich Matlab-like operators on vectors and matrices; a library of numerical routines; support for plotting.

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational

factor graphs, estimating parameters and performing inference.

by twitter for graph processing:

Cassovary is designed from the ground up to efficiently handle graphs with billions of edges. It comes with some common node and graph data structures and traversal algorithms. A typical usage is to do large-scale graph mining and analysis.

At Twitter, Cassovary forms the bottom layer of a stack that we use to power many of our graph-based features, including "Who to Follow" and “Similar to.” We also use it for relevance in Twitter Search and the algorithms that determine which Promoted Products users will see. Over time, we hope to bring more non-proprietary logic from some of those product features into Cassovary.

Abstract algebra library from twitter:

Code is targeted at building aggregation systems (via Scalding or Storm). It was originally developed as part of Scalding's Matrix API, where Matrices had values which are elements of Monoids, Groups, or Rings. Subsequently, it was clear that the code had broader application within Scalding and on other projects within Twitter.

! has experimental status !

sb_probdsl offers simple discrete probabilistic programming support using scala's new delimited continuations support.

A Markov Chain library for Scala

Markov chains represent stochastic processes where the probability distribution of the next step depends non-trivially on the current step, but does not depend on previous steps. Give this library some training data and it will generate new random data that statistically resembles it.

Signal/Collect is a programming model and framework for large-scale graph processing. The model is expressive enough to concisely formulate many iterated and data-flow algorithms on graphs, while allowing the framework to transparently parallelize the processing.

Includes stat and utility packages. Contains very basic and well known things, such as means std...

While it is not library it could help you a lot with dealing probabilities.