John John - 3 years ago 233
Scala Question

Functional programming with efficient data structures

((This question is not a duplicate of this one which asks how persistent data structures work, because mine is about how non-persistent data structures can be used in a functional way. I thank my lucky stars to already have a proper answer before all the simpletons and douches out there could close the question.))

This is a general question about functional programming but I'm also interested what answer it has in specific languages.

I only have beginner's knowledge about functional languages, so bear with me.

It is my understanding that functional languages put a focus on different data structures than imperative languages, as they like immutability: The persistent data structures.

For example, they all have an immutable list concept where you can form a new lists

x :: l
y :: l
from an existing list
and two new items
without all elements of
needing to be copied. This is likely implemented by the new list object internally pointing to the old one as the tail.

In imperative languages, such a data structure is rarely used, as they don't provide as good locality of reference as c-style arrays do.

In general, finding data structures that support the functional style is an endeavor of it's own, so it would be great if one wouldn't always have to do that.

Now here's an idea how one could use all classical data structures in functional programming if there's the right language support for it.

In general, a data structure in an imperative language has modifying operations defined on it (in pseudo-code):


The functional way of writing this is

newData = modified(data, someArgument)

The general problem is that this normally requires copying the data structure - except if the language could know that
will in fact not be used by anything else: Then, the modification could be done in the form of mutating the original and no one could tell the difference.

There is a large class of cases where the language could infer that property of "never used elsewhere": When the first argument to
is an unbound value, as in this example:

newData = modified(modified(data, someArgument))

may be used elsewhere, but
modified(data, someArgument)
clearly isn't.

This is what in C++ is called an "rvalue", and in the latest incarnation of C++, which ironically isn't functional at all otherwise, one can overload on such rvalues.

For example, one can write:

Data modified(Data const& data) { // returns a modified copy }
Data modified(Data && data) { // returns the modified original }

That means that in C++, one can actually take any mutable efficient data structure and convert it to having an immutable api that can be used in a purely functional way just as efficiently as the imperative version would be.

(There's the caveat that in C++ still sometimes casting is necessary to force the rvalue overload. And of course care need to be taken on implementing such data structures, ie. when using the rvalue overloads. That could probably be improved on though.)

Now my question:

Do actual functional languages have a similar mechanism? Or is this not necessary for some other reason?

(I tagged some specific languages I'm particularly interested in.)

Nio Nio
Answer Source

I'm pretty sure that a feature like alias analysis (checking if data is used elsewhere) is not part of the Scala compiler (nor part of other FP languages like Haskell and Clojure). The collections API in Scala (for example) is explicitly split into immutable and mutable packages. The immutable data structures use the concept of structural sharing to negate the need to copy data and therefore reduce the overhead (in terms of the amount of temporal data) of working with immutable structures.

As you already pointed out, methods like cons :: create a new immutable structure which, under the hood, contains a reference to any existing immutable data (rather than making a copy of it).

Conversions between mutable and immutable types (in Scala for example) make a copy of the mutable data (usually in a lazy fashion), rather than using any mechanisms such as checking whether the mutable structure is not referred to anywhere else and allowing it to be mutated.

When I first moved over from Java to Scala I initially thought that the (often) great deal of temporal data that must get created when working with immutable structures might be a performance constraint and involve some clever techniques to actually allow mutation where it was safe to do so but this is not so because the idea is that immutable data never points to younger values. Since younger values do not exist at the time when an old value is created, it cannot be pointed to at the time of creation, and since values are never modified, neither can it be pointed to later. The outcome is that FP languages like Scala/Haskell are able to generate all this temporal data since the garbage collectors are able to remove it in a very short time frame.

In a nutshell, Scala/Haskell (I'm not sure about F#) do not allow mutation of immutable structures because the state of runtime environments like the current JVM's have very efficient garbage collection and therefore temporal data can be removed very quickly. Of course, as I'm sure you are aware, an immutable structure containing mutable elements is perfectly possible in FP languages like Scala but although the mutable elements can be changed, the immutable container cannot ie. elements can neither be added/removed.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download