F# and Purity

Purely functional programming according to wikipedia

designates a programming paradigm that treats all computation as the evaluation of mathematical functions. Purely functional programing may also be defined by forbidding changing state and mutable data.

The importance of the purely functional approach cannot be overstated: it eliminates entire classes of mutation-related bugs; encourages composable abstraction; pure code can be reasoned about equationally; the explicit segregation of side-effects acts as a forcing function for better abstraction. At the same time, immutable persistent data structures can be cheaply incremented while shared safely across multiple threads. Pure programs also admit compile-time optimizations that would be tricky to achieve in the presence of state.

Benefits aside, purity is an attribute of expressions and their formal semantics. A pure program would still end up executing on regular computer hardware, where it could demonstrate the following effects: high CPU utilization, excessive heap allocations, stack overflows, divizion by zero errors, etc. Thus the connection of pure code to functions of the mathematical variety should not be considered a literal one, otherwise all correct Fibonacci implementations would be equal by virtue of extensionality. Being able to reason how code —pure or otherwise— executes on the underlying hardware is critically important for the working programmer.

An ML Tradition

Transforming purely functional declarations into equivalent, optimal imperative code is a holy grail, still very much a topic of active research. In the general case though, performance can only be attained by forsaking purity. This is very much compatible with the ML philosophy, which has traditionally embraced imperative features in its language core. Quoting from Real World OCaml

Pure code is the default in OCaml, and for good reason—it’s generally easier to reason about, less error prone and more composable. But imperative code is of fundamental importance to any practical programming language, because real-world tasks require that you interact with the outside world, which is by its nature imperative. Imperative programming can also be important for performance. While pure code is quite efficient in OCaml, there are many algorithms that can only be implemented efficiently using imperative techniques.

OCaml offers a happy compromise here, making it easy and natural to program in a pure style, but also providing great support for imperative programming.

This also applies particularly well to F#, an OCaml dialect for the .NET framework. In my personal experience, performance gains when switching from purely functional to imperative code in F# can be particularly dramatic. Strategically adopting imperative implementations in performance-critical components is often vital to ensuring sustainability of a system.

Imperative Programming done right

So what does this all mean? Should we just give up on purely functional code in the interest of performance? Is F# really just a way to write C# programs in ML-style syntax?

Well, not really. I do claim though that there is a sweet spot where impure features can be utilized to combine the benefits of pure FP with a lot of the efficiency of procedural code. Defining that sweet spot is hard, however I cannot think of a more concise example than Nessos Streams, which provides a highly efficient functional streams implementation in just over 40 lines of code. Other examples could include FParsec and Hopac, whose core implementations have been written in C# out of performance considerations.

If one had to nominate a key guiding priciple for using imperative features in F# code, then surely we would single out referential transparency. Very roughly, a function is referentially transparent if it behaves like a pure function, even if its implementation might use imperative constructs. For example, the function

let incr (cell : int ref) : unit = cell := !cell + 1

is not referentially transparent because substituting a call to incr with the the result of the computation is not a behaviour-preserving transformation, in general.

We can still use imperative features to define functions that are in fact referentially transparent. A good example is implementing groupBy for lists. A purely functional groupBy could be written as follows:

let groupBy (proj : 'T -> 'Key) (ts : 'T list) : ('Key * 'T list) list =
    let folder (map : Map<'Key, 'T list>) (t : 'T) =
        let key = proj t
        let grouped = defaultArg (Map.tryFind key map) []
        Map.add key (t :: grouped) map

    ts
    |> List.fold folder Map.empty
    |> Map.toList
    |> List.map (fun (k,v) -> k, List.rev v)

This implementation may be pure, however performance-wise it is not ideal. We can still produce a more efficient, imperative equivalent of the same function:

let groupBy' (proj : 'T -> 'Key) (ts : 'T list) : ('Key * 'T list) list =
    let dict = new Dictionary<'Key, ResizeArray<'T>>()
    for t in ts do
        let key = proj t
        let mutable array = null
        if not <| dict.TryGetValue(key, &array) then
            array <- new ResizeArray<'T>()
            dict.Add(key, array)
        array.Add t

    dict
    |> Seq.map (function KeyValue(k,v) -> k, Seq.toList v)
    |> Seq.toList

The second function provides order-of-magnitude performance gains, but is also referentially transparent: it can be substituted with the pure implementation or the resulting value without affecting behaviour of the overall program. In fact, this approach very much reflects how the core library implements its own functions.

Conclusion

Purely functional programming in F# provides a very effective means of reasoning about code, however in the interest of efficiency it is often preferable to leverage the imperative aspects of the language. When done carefully, it can result in better performing code that sacrifices few if any of the benefits of the purely functional approach.

In fact, my claim is that this is very much the idiomatic way of using ML in general and F# in particular. Pretending that F# is a purely functional language à la Haskell is highly counterproductive, particularly when authoring production-grade projects.

Advertisements
F# and Purity

7 thoughts on “F# and Purity

  1. Yes, it is important to understand that purity or referential transparency is an extensional, not an intensional, property.

    It seems to me that a lot of people who learn FP go through a phase where they write everything in a “functional way”, which means avoiding any and all basic effects. Perhaps one (of many) reason(s) for this is confusion about the nature of purity. IOW, thinking that purity is an intensional property.

    This is usually fine, because, in the small, as long as it doesn’t change the signatures of functions, it doesn’t really matter. However, it seems to me that there are two kinds of architectural anti-patterns resulting from an intensional understanding of purity.

    The first of those anti-pattern is an architectural anti-pattern that I would describe as “naïve FP”. An example of this would be the the Elm architecture. In “naïve FP” one avoids both basic effects and encodings of effects, such as lenses and monads. The typical result is a lot of boilerplate and occasional embarrassments like accidentally having effects that actually break things: https://github.com/elm-lang/elm-compiler/issues/889

    The other anti-pattern is the opposite of naïve FP. I haven’t given a name for it, but it is characterized by complex encodings of effects, in the name of purity, for little added benefit. I believe Jon Sterling talks about the same point here: https://www.reddit.com/r/haskell/comments/3lwom1/prezi_uses_haskell/cvaenzx/

    Aside from the problem with the complexity itself, it also seems that such focus on intensional purity often misses the point. I haven’t actually spent that much time with it, but it seems to me that the PureScript Halogen is a good example of this. The second sentence in the introduction gives away the mistake:

    “Each component is a self contained unit that has its own state, and re-renders when the state changes.”

    A central idea of the design is that components hold state… but shouldn’t that be something to be avoided? IOW, the better direction would be to start thinking about how to avoid having state in components in the first place. This way components and compositions of components become simpler and easier to reason about. As much as I hate the boilerplate in Elm and Redux, for example, that is one idea they got basically right. I don’t really want to talk about UI architectures here, so I’ll just say that from the premise of having state in components, Halogen then goes onto competently create a highly elaborate encoding of effects using a wide variety of FP buzzwords (lenses, free monads, query algebras, coproducts, etc…).

    So, to conclude, it is important to understand that purity is extensional and that an intensional understanding of purity can easily lead one a stray. In the large, it is important to understand which parts of your program you want to keep referentially transparent and where you might want to be able to use effects and whether encoding those effects brings significant benefits or not.

  2. Great post Eirik. Also great comment Vesa. I can’t elaborate as well as you all can, but I can at least share some thoughts.

    I’ve been through the naïve path. It has taken quite some time and experience to finally break free from the thought of everything needing to be pure all the time. It all depends on the problem and how much actual benefit you gain. Creating abstractions over an API that give the sense of purity may end up being a lot more work than just using the original API or even creating a thin DSL. Not only that, these abstractions could also hurt interop with external libs that used the original API and now you want to integrate it into your project. It takes a bit of intuition to know where some of the boundaries are. I’m still learning.

  3. Eirik, I think you’re consistenly hitting the nail on the head looking at functional programming through the lens of the ML style. Check out this post, by Jon Sterling at Carnegie Mellon, which outlines a similar idea to your post: http://www.jonmsterling.com/posts/2015-06-14-functionality-mutability-non-determinism.html

    Especially note the section starting with the following paragraph:

    > A technique which is employed every day in functional ML developments is to hide imperative and nondeterministic code behind an abstraction boundary

  4. Great examples of wrapping mutability in nice interfaces in F#. Production Haskell often uses the same techniques. The main differences are that the type system allows programmers to be specific about what effects are introduced where, and that in many cases the compiler can confirm our claim that those effects do not escape a particular function.

    1. eirik says:

      In Haskell proper you are either allowed to have all effects or nothing at all. There is interesting research in type systems that allow more fine-grained control over what effects a program admits.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s