TypeShape: Practical Generic Programming in F#

Last week I announced a new library, TypeShape, with claims that it provides a practical way of doing generic programming in F#. I’m following up with this blog post to elaborate why I believe this to be genuinely useful, and how it could benefit the day-to-day life of the working .NET developer.

The pain of Reflection

Almost everybody who has worked with .NET will at some point need to dabble in the murky ways of reflection. Reflection is needed in scenaria where we need to access data in an indirect fashion, or where circumvention of the type system is necessary.

For example, assume that we have defined the following static method

type Foo =
    static member Bar<'T>(?optionalParam : 'T) : unit =
        printfn "Invoked with parameter %A" optionalParam

Assume now that we would like invoke that method, with a value whose type cannot be known at compile time. In other words, we want to define a function

val invokeUntyped : obj -> unit

which takes an input of type obj and invokes the generic method using the underlying type of the object instance. How do we do this? By using reflection of course!

open System.Reflection

let invokeUntyped (value:obj) =
    // Step 1: get the underlying System.Type for the value
    let t = value.GetType()
    // Step 2: locate the required method and apply the type argument
    let methodInfo =
        typeof<Foo>
            .GetMethod("Bar", BindingFlags.Public ||| BindingFlags.Static)
            .MakeGenericMethod [|t|]

    // Step 3: since the parameter is optional, it must be wrapped
    let optTy = typedefof<_ option>.MakeGenericType [|t|]
    let optCtor = optTy.GetConstructor [|t|]
    let optVal = optCtor.Invoke [|value|]

    /// Step 4: invoke the method with constructed optional parameter
    methodInfo.Invoke(null, [|optVal|]) :?> unit

This is cumbersome code to implement and is highly susceptible to breakage; even minor changes to the method signature will result in runtime errors. What’s more, reflection-based implementations are known to be significantly slower than their IL counterparts.

Using TypeShape

The TypeShape library can be used to implement the same functionality, but in a significantly safer and easy-to-read fashion:

open TypeShape

let invokeUntyped' (value:obj) =
    let shape = TypeShape.Create (value.GetType())
    shape.Accept { new ITypeShapeVisitor<unit> with
        member __.Visit<'T> () = Foo.Bar(value :?> 'T)}

Let’s have a look at the code, line by line.

The first line takes the underlying type of the input value and uses that to create an object of type TypeShape. This object encapsulates essential information on the type of the object.

The second line accepts an object expression of type ITypeShapeVisitor, which in turn invokes the method Foo.Bar. The second line is an instance of what is known as the visitor pattern, a design pattern commonly found in object-oriented programming. In this case, our visitor takes no arguments other than a type variable 'T. Passing this visitor to the TypeShape instance will have it invoked using the object type as argument, hence the downcast is expected to be successful. Importantly, the invocation is performed normally, thus any disagreement in the method signature will be picked up by the compiler.

In other words, TypeShape lets us introduce type variables into scope using the relatively concise approach of F# object expressions.

Nothing magical

The implementation of TypeShape is surprisingly simple to define:

open System

type ITypeShapeVisitor<'R> =
    abstract Visit<'T> : unit -> 'R

[<AbstractClass>]
type TypeShape() =
    abstract Type : Type
    abstract Accept : ITypeShapeVisitor<'R> -> 'R

type TypeShape<'T>() =
    inherit TypeShape()
    override __.Type = typeof<'T>
    override __.Accept v = v.Visit<'T>()

type TypeShape with
    static member Create(t : Type) =
        let tsTy = typedefof<TypeShape<_>>.MakeGenericType [|t|]
        Activator.CreateInstance tsTy :?> TypeShape

In essence, TypeShape uses a minimal amount of reflection to bootstrap typed instances, then takes advantage of the ordinary .NET type system to access type information on-demand. TypeShape instances encapsulate and bear witness to types that may not be known at compile time.

Going Further

Let’s take a look at a different application: suppose we have a tuple whose precise type cannot be known at compile time. A common example of this is the object returned by the ShapeCombination active pattern in the F# quotations module. Suppose we would like like to extract either or both of the items contained in the tuple. Here’s how it could be done using reflection:

let extractTupleElements (value:obj) =
    let t = value.GetType()
    if not t.IsGenericType || t.GetGenericTypeDefinition() <> typedefof<_ * _> then
        invalidArg "value" "not a tuple type!"
    let m_Item1 = t.GetProperty("Item1")
    let m_Item2 = t.GetProperty("Item2")
    m_Item1.GetValue(value), m_Item2.GetValue(value)

Again, the same application could be simplified using the TypeShape library:

let extractTupleElements' (value : obj) =
    match TypeShape.Create (value.GetType()) with
    | Shape.Tuple2 (s : IShapeTuple2) ->
        s.Accept {
            new ITuple2Visitor<obj * obj> with
                member __.Visit<'T, 'S>() =
                    let t,s = value :?> 'T * 'S
                    box t, box s
        }

    | _ -> invalidArg "value" "not a tuple type!"

In this case, we use the included Shape.(|Tuple2|_|) active pattern that checks against our shape being a 2-tuple. If successful, it returns an instance of type IShapeTuple2 that accepts a different visitor, ITuple2Visitor, which introduces the tuple element types in scope.

Similarly, here’s how we can check whether an unknown F# map contains a particular key:

let mapContainsKeyUntyped (key:obj) (map:obj) =
    match TypeShape.Create(map.GetType()) with
    | Shape.FSharpMap (s : IShapeFSharpMap) ->
        s.Accept {
            new IFSharpMapVisitor<bool> with
                member __.Visit<'K,'V when 'K : comparison> () =
                    (map :?> Map<'K,'V>).ContainsKey(key :?> 'K)
        }

    | _ -> invalidArg "map" "not an F# map!"

Generic Programming

TypeShape active patterns can be used to orchestrate what could be considered as generic programming. For instance, take this value printer generator:

let rec mkPrinter<'T> () : 'T -> string = mkPrinterUntyped typeof<'T> :?> _
and private mkPrinterUntyped (t : Type) : obj =
    match TypeShape.Create t with
    | Shape.Unit -> box(fun () -> "()")
    | Shape.Bool -> box(sprintf "%b")
    | Shape.Int32 -> box(sprintf "%d")
    | Shape.String -> box(sprintf "\"%s\"")
    | Shape.FSharpOption s ->
        s.Accept {
            new IFSharpOptionVisitor<obj> with
                member __.Visit<'T> () =
                    let tp = mkPrinter<'T>()
                    box(function None -> "None" | Some t -> sprintf "Some (%s)" (tp t))
        }

    | Shape.Tuple2 s ->
        s.Accept {
            new ITuple2Visitor<obj> with
                member __.Visit<'T, 'S> () =
                    let tp = mkPrinter<'T>()
                    let sp = mkPrinter<'S>()
                    box(fun (t : 'T, s : 'S) -> sprintf "(%s, %s)" (tp t) (sp s))
        }

    | Shape.FSharpList s ->
        s.Accept {
            new IFSharpListVisitor<obj> with
                member __.Visit<'T> () =
                    let tp = mkPrinter<'T>()
                    box(fun ts -> ts |> List.map tp |> String.concat "; " |> sprintf "[%s]")
        }

    | Shape.FSharpSet s ->
        s.Accept {
            new IFSharpSetVisitor<obj> with
                member __.Visit<'T when 'T : comparison> () =
                    let tp = mkPrinter<'T>()
                    box(fun (s:Set<'T>) -> s |> Seq.map tp |> String.concat "; " |> sprintf "set [%s]")
        }

    | _ -> failwithf "unsupported type '%O'" t

The implementation can be used to generate printers for anything within the prescribed algebra of types:

let printer = mkPrinter<(bool * string) option * Set<int * string list option>>()

More importantly, any reflection code will only be executed at generation time, meaning that generated printers execute very efficiently:

// Real: 00:00:00.561, CPU: 00:00:00.562, GC gen0: 32, gen1: 0, gen2: 0
for i = 1 to 1000 do ignore <| sprintf "%A" value
// Real: 00:00:00.010, CPU: 00:00:00.000, GC gen0: 1, gen1: 0, gen2: 0
for i = 1 to 1000 do ignore <| printer value

This technique is being utilized in libraries such as FsPickler and FSharp.AWS.DynamoDB, and is an important contributor to their performance.

Conclusion

If your project relies heavily on reflection, you should consider giving TypeShape a try. It could improve readability and maintainability of your code, and may in some cases lead to better performance. Check it out at https://github.com/eiriktsarpalis/TypeShape and please submit your feedback and/or bug reports!

TypeShape: Practical Generic Programming in F#

Reconciling Stacktraces with Computation Expressions

Computation expressions are an excellent F# language feature. They outline the F# way of defining monads but are immensely more flexible. Applications include Asynchronous workflows, sequence/list comprehensions and our very own cloud workflows. Overall, it’s a feature that has seen lots of use (and misuse!) by the F# community. More importantly, it can all be done at the library level without needing to modify the language itself.

For a more in-depth introduction to computation expressions, I refer you to Scott Wlaschin’s excellent series on the subject.

The Problem

Computation expressions may be great, but they’re not without problems. One of the more common annoyances is exception stacktraces; generated expressions are desugared into nested lambda invocations, which means that their corresponding stacktraces are often unreadable.

To make matters worse, implementations such as async have their own exception handling logic implemented, which often leads to stacktraces being completely erased as a side-effect of this manipulation.

Let’s make the problem a bit more concrete by presenting an example:

let rec factorial n = async {
    if n = 0 then return failwith "bug!"
    else
        let! pd = factorial (n - 1)
        return n * pd
}

Async.RunSynchronously(factorial 5)

If we attempt to execute the code we will be getting the following stacktrace (as of F# 4.0):

System.Exception: bug!
   at FSI_0008.factorial@128-2.Invoke(Unit unitVar) in C:\Users\eirik\Desktop\meta2.fsx:line 128
   at Microsoft.FSharp.Control.AsyncBuilderImpl.callA@851.Invoke(AsyncParams`1 args)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.FSharp.Control.AsyncBuilderImpl.commit[a](Result`1 res)
   at Microsoft.FSharp.Control.CancellationTokenOps.RunSynchronously[a](CancellationToken token, FSharpAsync`1 computation, FSharpOption`1 timeout)
   at Microsoft.FSharp.Control.FSharpAsync.RunSynchronously[T](FSharpAsync`1 computation, FSharpOption`1 timeout, FSharpOption`1 cancellationToken)
   at <StartupCode$FSI_0009>.$FSI_0009.main@() in C:\Users\eirik\Desktop\meta2.fsx:line 135
Stopped due to error

Notice that method invocation information is missing: factorial was called recursively 5 times before the error occured. For reference, compare this against a native implementation of the same function:

let rec factorial n =
    if n = 0 then failwith "bug!"
    else
        n * factorial(n - 1)

factorial 5

which gives the stacktrace

System.Exception: bug!
   at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38
   at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38
   at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38
   at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38
   at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38
   at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38
   at <StartupCode$FSI_0014>.$FSI_0014.main@() in C:\Users\eirik\Desktop\meta1.fsx:line 42
Stopped due to error

At first glance, this may seem like an insignificant shortcoming. In reality though, it can be the source to a lot of confusion. Asynchronous code tends interface with system APIs that are often sources of exceptions. In many real-world applications it is impossible to determine the source of an exception simply by looking at the generated stacktrace.

We need to improve on this; the goal of this post is to identify workarounds using the current state of F# and to propose potential additions for future releases of the language.

Defining a Continuation Monad

For the purposes of this exercise, we will need to define a continuation monad on which to build our improvements on.

/// Continuation workflow that accepts a pair of
/// success and exception continuations
type Cont<'T> = ('T -> unit) -> (exn -> unit) -> unit

/// return, the monadic unit
let ret (t : 'T) : Cont<'T> = fun sc _ -> sc t

/// monadic bind combinator
let (>>=) (f : Cont<'T>) (g : 'T -> Cont<'S>) : Cont<'S> =
    fun sc ec ->
        let sc' (t : 'T) =
            match (try Choice1Of2 (g t) with e -> Choice2Of2 e) with
            | Choice1Of2 g -> g sc ec
            | Choice2Of2 e -> ec e

        f sc' ec

Let’s now define our continuation builder:

type ContBuilder() =
    member __.Zero() = ret ()
    member __.Return t = ret t
    member __.Bind(f, g) = f >>= g
    member __.Delay(f : unit -> Cont<'T>) : Cont<'T> = ret () >>= f

let cont = new ContBuilder()

and a run function for executing our continuation workflows:

let run (cont : Cont<'T>) : 'T =
    let result = ref Unchecked.defaultof<'T>
    cont (fun t -> result := t) raise
    !result

This is a toy implementation that closely mimics the implementation of F# Async. As can be expected, we can observe the same stacktrace issue as in async.

Using Symbolic Stacktraces

Before we can move on, we need to give the definition of SymbolicException. This provides a way of appending symbolic entries to the stacktrace of an exception, which can then be re-raised as a regular .NET exception. This borrows from a technique also used here.

type SymbolicException =
    {
        Source : Exception
        Stacktrace : string list
    }

[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module SymbolicException =

    open System.Reflection

    /// clones an exception to avoid mutation issues related to the stacktrace
    let private clone (e : #exn) =
        let bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter()
        use m = new System.IO.MemoryStream()
        bf.Serialize(m, e)
        m.Position <- 0L
        bf.Deserialize m :?> exn

    let private remoteStackTraceField =
        let getField name = typeof<System.Exception>.GetField(name, BindingFlags.Instance ||| BindingFlags.NonPublic)
        match getField "remote_stack_trace" with
        | null -> getField "_remoteStackTraceString"
        | f -> f

    /// appens a line to the symbolic stacktrace
    let append (line : string) (se : SymbolicException) =
        { se with Stacktrace = line :: se.Stacktrace }

    /// Raises exception with its appended symboic stacktrace
    let raise (se : SymbolicException) =
        let e' = clone se.Source
        let stacktrace =
            seq { yield e'.StackTrace ; yield! List.rev se.Stacktrace }
            |> String.concat Environment.NewLine

        remoteStackTraceField.SetValue(e', stacktrace + Environment.NewLine)
        raise e'

    /// Captures an exception into a SymbolicException instance
    let capture (e : exn) = { Source = clone e ; Stacktrace = [] }

Symbolic Stacktraces & Computation Expressions

Now that we have established a technique for creating symbolic stacktraces, let’s see how we can make use of this in the context of computation expressions.

First, let’s examine the structure of a typical entry in a .NET stacktrace. Assuming our previous factorial example,

let rec factorial n =
    if n = 0 then failwith "bug!"
    else
        n * factorial(n - 1)

The stacktrace entry corresponding to the recursive call appears like so:

at FSI_0013.factorial(Int32 n) in C:\Users\eirik\Desktop\meta1.fsx:line 38

This evidently contains two pieces of information:

  1. the method signature that was invoked.
  2. Source code location of the call site. This assumes presence of a corresponding symbols file.

Let’s see how we could recover similar data using expression builders. First off, recovering the invoked method name. Consider the following modification to the continuation builder:

type ContBuilder() =
    ...
    member __.Delay(f : unit -> Cont<'T>) : Cont<'T> =
        printfn "%O" (f.GetType())
        ret () >>= f

which once executed on factorial yields the output FSI_0026+factorial@124-11. This (somewhat) recovers the method signature of the factorial method.

In similar fashion, we can recover information on the call site by recovering the type name of the continuation parameter in Bind:

type ContBuilder() =
    ...
    member __.Bind(f : Cont<'T>, g : 'T -> Cont<'S>) : Cont<'S> =
        printfn "%O" (g.GetType())
        f >>= g

This yields FSI_0058+factorial@129-30, which (somewhat) recovers the location of the call site (containing method and line number).

Updating the Continuation Builder

Let’s now use this information to define our updated continuation builder. First, we need to update our continuation workflow type:

type Cont<'T> =
    {
        Definition : Type option
        Body : ('T -> unit) -> (SymbolicException -> unit) -> unit
    }

It has now been changed to thread symbolic exceptions in its exception continuation and to carry the defining type for the instance, if applicable. Let’s now continue with the builder definition:

type ContBuilder() =
    let protect f x = try Choice1Of2 (f x) with e -> Choice2Of2 e
    let mkCont def bd = { Body = bd ; Definition = def }

    member __.Return(t : 'T) = mkCont None (fun sc _ -> sc t)
    member __.Zero() = __.Return()
    member __.Delay(f : unit -> Cont<'T>) : Cont<'T> =
        let def = f.GetType()
        mkCont (Some def) (fun sc ec ->
            let sc' t =
                match protect f () with
                | Choice1Of2 g -> g.Body sc ec
                | Choice2Of2 e -> ec (SymbolicException.capture e)

            __.Zero().Body sc' ec)

The definition of Delay has been changed so that instances it defines are branded with a definition type. Continuing, let’s see now how we can make use of this metadata in the implementation of Bind:

type ContBuilder() =
    ...
    member __.Bind(f : Cont<'T>, g : 'T -> Cont<'S>) : Cont<'S> =
        mkCont None (fun sc ec ->
            let sc' (t : 'T) =
                match protect g t with
                | Choice1Of2 g -> g.Body sc ec
                | Choice2Of2 e -> ec (SymbolicException.capture e)

            let ec' (se : SymbolicException) =
                match f.Definition with
                | None -> ec se
                | Some def ->
                    let callSite = g.GetType()
                    let stackMsg = sprintf "   at %O in %O" def callSite
                    ec (SymbolicException.append stackMsg se)

            f.Body sc' ec')

In this updated definition of Bind, we attach an entry to the symbolic stacktrace of the exception continuation whenever binding to a “method”, that is an expression defined using Delay.

Finally, we give the updated definition of run:

let run (cont : Cont<'T>) =
    let result = ref Unchecked.defaultof<'T>
    let sc (t : 'T) = result := t
    let ec se =
        match cont.Definition with
        | None -> SymbolicException.raise se
        | Some def ->
            let stackMsg = sprintf "   at %O in Cont.run" def
            se |> SymbolicException.append stackMsg |> SymbolicException.raise

    cont.Body sc ec
    !result

Let’s now verify that the implementation works by re-running factorial:

System.Exception: bug!
   at FSI_0058.factorial@126-29.Invoke(Unit unitVar) in C:\Users\eirik\Desktop\meta2.fsx:line 126
   at FSI_0047.ContBuilder.protect[b,c](FSharpFunc`2 f, b x) in C:\Users\eirik\Desktop\meta2.fsx:line 54
   at FSI_0058+factorial@126-29 in FSI_0058+factorial@129-30
   at FSI_0058+factorial@126-29 in FSI_0058+factorial@129-30
   at FSI_0058+factorial@126-29 in FSI_0058+factorial@129-30
   at FSI_0058+factorial@126-29 in FSI_0058+factorial@129-30
   at FSI_0058+factorial@126-29 in FSI_0058+factorial@129-30
   at FSI_0078+factorial@132-31 in Cont.run
   at FSI_0022.SymbolicExceptionModule.raise[a](SymbolicException se) in C:\Users\eirik\Desktop\meta2.fsx:line 40
   at FSI_0060.run[T](Cont`1 cont) in C:\Users\eirik\Desktop\meta2.fsx:line 101
   at <StartupCode$FSI_0061>.$FSI_0061.main@() in C:\Users\eirik\Desktop\meta2.fsx:line 124

It is left as an exercise to the reader to improve formatting of symbolic stack entries by performing further parsing of the internal type names.

Implementing ReturnFrom

Let’s now have a go at implementing ReturnFrom, the tail call keyword. By virtue of the its type signature, the call site location cannot be recovered in this context:

type ContBuilder() =
    ...
    member __.ReturnFrom (f : Cont<'T>) =
        match f.Definition with
        | None -> f
        | Some df ->
            { f with Body = fun sc ec ->
                    let ec' (se : SymbolicException) =
                        let stackMsg = sprintf "   at %O" df
                        ec (SymbolicException.append stackMsg se)

                    f.Body sc ec' }

Again, we verify by running

let rec odd (n : int) = 
    cont {
        if n = 0 then return false
        else
            return! even (n - 1)
    }

and even (n : int) =
    cont {
        if n = 0 then return failwith "bug!"
        else
            return! odd (n - 1)
    }

odd 5 |> Cont.run

which yields

System.Exception: bug!
   at FSI_0011.even@149-3.Invoke(Unit unitVar) in C:\Users\eirik\Desktop\meta2.fsx:line 149
   at FSI_0002.ContBuilder.protect[a,b](FSharpFunc`2 f, a x) in C:\Users\eirik\Desktop\meta2.fsx:line 54
   at FSI_0011+even@149-3
   at FSI_0011+odd@142-3
   at FSI_0011+even@149-3
   at FSI_0011+odd@142-3
   at FSI_0011+even@149-3
   at FSI_0011+odd@142-3 in Cont.run
   at FSI_0002.SymbolicExceptionModule.raise[a](SymbolicException se) in C:\Users\eirik\Desktop\meta2.fsx:line 40
   at FSI_0002.Cont.run[T](Cont`1 cont) in C:\Users\eirik\Desktop\meta2.fsx:line 111
   at <StartupCode$FSI_0011>.$FSI_0011.main@()
Stopped due to error

Drawbacks

The above approach has few clear drawbacks

  1. Information relayed to symbolic stacktraces are of low quality, compared to .NET stacktraces. Ideally we would like to recover the MethodInfo that corresponds to the call, but that does not appear to be possible.
  2. The implementation makes light use of reflection at places. This may affect performance unless care is taken, and could hurt compatibility with certain platforms.
  3. Impossible to distinguish between actual method calls and inlined expressions. This can result in noisy stacktraces at places.

Future Directions

The issues discussed above beg the question: what could be added to the F# language itself to improve support for stacktraces? It’s evident that the required information is known by the compiler. So here’s a proposal:

First, let’s outline a type that encodes an entry from a stacktrace:

type Location = { File : string ; Line : int ; Column : int }
type MethodInvocationInfo = { Method : MethodInfo ; Location : Location }

Now, consider the following hypothetical computation expression implementation.

type Cont<'T> = ('T -> unit) -> (SymbolicException -> unit) -> unit

type ContBuilder() =
    let protect f x = try Choice1Of2 (f x) with e -> Choice2Of2 e
    let fmt (m : MethodInvocationInfo) = sprintf "   at %O in %s:line %d" m.MethodInfo m.Location.File m.Location.Line 
    member __.Bind(data : MethodInvocationInfo, f : Cont<'T>, g : 'T -> Cont<'S>) : Cont<'S> =
        fun sc ec ->
            let sc' (t : 'T) =
                match protect g t with
                | Choice1Of2 g -> g sc ec
                | Choice2Of2 e -> ec (SymbolicException.Capture e)

            let ec' (e : SymbolicException) = ec (e.Append (fmt data))

            f sc' ec'

    member __.ReturnFrom(data : MethodInvocationInfo, f : Cont<'T>) =
        fun sc ec ->
            let ec' (e : SymbolicException) = ec (e.Append (fmt data))
            f sc ec'

Under the proposed change, it should be possible to define overloads for Bind, Return, ReturnFrom and Yield that accept an additional method invocation metadata parameter. These would be used by the compiler only in cases where corresponding keywords involved a method invocation on the right-hand side. In other words, the line

let! x = async { return 1 }

would desugar to a call to the standard Bind overload, whereas

let f i = async { return i }
let! x = f 1

would translate to the new overload, passing metadata on the invocation of function f at compile time.

Conclusions

The ability to generate exception stacktraces in computation expressions is a feature sorely missed in F# computation expressions. Adopting improvements towards this direction would greatly improve the debugging experience of asynchronous workflows and other computation expressions, including mbrace. It would be great to see such improvements added to the next release of F#.

Reconciling Stacktraces with Computation Expressions

Deploying .NET code instantly using Vagabond

 

This is post #30 of the English 2014 F# advent calendar. Thanks to Sergey Tihon for inviting me and suggesting the topic. Be sure to check out all the other awesome posts! In this post I will be describing Vagabond, a dependency management library of mine. I will try to walk through some of the subtleties in the .NET framework that drove the library implementation, in an attempt to make it more accessible. All code and examples presented can be found in this repository.

Prelude

It is often claimed in functional programming that functions are treated as values. This is a valid assumption in simple applications, permitting powerful patterns when writing code. But is this a general truth, or simply a linguistic abstraction? If functions really are values, one would expect that they exhibit all properties normally expected of values. For instance, it should be possible to serialise functions and transmit to a remote process for use. But wait, does serialising a function even make sense? In principle it should, we know for a fact that all code is essentially data. But the answer really depends on language/runtime support, serialization facilities and how lambdas are represented. Our goal here is to consider the .NET framework and F# in particular. How does F#/.NET represent lambdas? We’ll attempt to illustrate by example.

‘Thunk Server’

Our goal is to define a ‘thunk server’, an actor implementation that receives arbitrary thunks (lambdas of type unit -> 'T) which it executes replying with the result. For this exercise we will be using Thespian, a distributed actor framework for F#. The choice of actor framework is not particularly important; Thespian should be easy to grasp however, since its programming model closely follows that of the F# MailboxProcessor. We start off with the simplest imaginable conception of how a thunk server should be implemented:

open Nessos.Thespian

// Actor API: a thunk and reply channel in which to send the result
type ThunkMessage = (unit -> obj) * IReplyChannel<Choice<obj, exn>>

// actor body definition
let rec serverLoop (self : Actor<ThunkMessage>) : Async<unit> =
    async {
        // receive next message
        let! thunk, reply = self.Receive()
        // execute, catching any exception
        let result : Choice<obj, exn> =
            try thunk () |> Choice1Of2
            with e -> Choice2Of2 e
        // reply
        do! reply.Reply result
        return! serverLoop self
    }

// initialize an actor instance in the current process
let server : Actor<ThunkMessage> = Actor.Start "thunkServer" serverLoop

We can now interface with our thunk server like so:

/// submit a thunk for evaluation to target actor ref
let evaluate (server : ActorRef<ThunkMessage>) (thunk : unit -> 'T) : 'T =
    // post to actor and wait for reply
    let result = server <!= fun replyChannel -> (fun () -> thunk () :> obj), replyChannel
    // downcast if value, raise if exception
    match result with
    | Choice1Of2 o -> o :?> 'T
    | Choice2Of2 e -> raise e

evaluate server.Ref (fun () -> 1 + 1) // returns 2

But is this implementation correct? Remember, our goal is to submit lambdas to a remote process for execution. Using the companion project we can test precisely this scenario:

// spawn a local windowed process that runs the actor
let server : ActorRef<ThunkMessage> = spawnWindow ()

Let’s attempt to submit some code to the server:

// load a third party library and submit code for evaluation
#r "ThirdPartyLibrary.dll"
evaluate server ThirdPartyLibrary.thirdPartyLambda

Evaluation fails with the following error:

System.IO.FileNotFoundException: Could not load file or assembly 'ThirdPartyLibrary, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.
   at System.Reflection.RuntimeAssembly.GetType(RuntimeAssembly assembly, String name, Boolean throwOnError, Boolean ignoreCase, ObjectHandleOnStack type)
   at System.Reflection.RuntimeAssembly.GetType(String name, Boolean throwOnError, Boolean ignoreCase)
   at Nessos.FsPickler.ReflectionCache.loadMemberInfo(FSharpOption`1 tyConv, FSharpFunc`2 loadAssembly, FSharpFunc`2 getMethodSignature, CompositeMemberInfo mI)

Why does this happen? Because lambdas in F# are not exactly values. Rather, they are instances of compiler generated classes that inherit the abstract type FSharpFunc. Needless to say, a lambda cannot be successfully deserialised in a process unless its particular underlying class is loaded in the current application domain.

Fixing the problem

So how can we fix the problem? A popular approach is to distribute code using expression trees. Expression trees are clearly values, hence can be easily serialised and evaluated in the target machine. However, this is not a satisfactory solution: expression trees do not support type definitions; evaluation of expression trees is not always efficient; expression trees can still reference code in out-of-reach assemblies. We need to to better than this! Our experience illustrates that a proper serialisation format for lambdas must also include their dependent assemblies. So let’s try to do precisely that:

open System.IO
open System.Reflection

// Raw assembly container
type AssemblyPackage =
    {
        FullName : string
        AssemblyImage : byte []
    }

// transitively traverse assembly dependencies for given object
let gatherDependencies (object:obj) : AssemblyPackage list =
    let rec aux (gathered : Map<string, Assembly>) (remaining : Assembly list) =
        match remaining with
        // ignored assembly
        | a :: rest when gathered.ContainsKey a.FullName || a.GlobalAssemblyCache -> aux gathered rest
        // came across new assembly, add to state and include transitive dependencies to inputs
        | a :: rest ->
            let dependencies = a.GetReferencedAssemblies() |> Seq.map Assembly.Load |> Seq.toList
            aux (Map.add a.FullName a gathered) (dependencies @ rest)
        // traversal complete, create assembly packages
        | [] ->
            gathered
            |> Seq.map (fun (KeyValue(_,a)) ->
                                { FullName = a.FullName
                                  AssemblyImage = File.ReadAllBytes a.Location })
            |> Seq.toList

    aux Map.empty [object.GetType().Assembly]

// loads raw assemblies in current application domain
let loadRawAssemblies (pkgs : AssemblyPackage list) =
    pkgs |> List.iter (fun pkg -> Assembly.Load pkg.AssemblyImage |> ignore)

The code above uses reflection to traverse the assembly dependencies for a given object. The raw binary assembly images are then read from disk to be submitted and loaded by the remote recipient. We can now revise our thunk server implementation as follows:

// our actor API is augmented with an additional message
type ThunkMessage =
    | RunThunk of (unit -> obj) * IReplyChannel<Choice<obj, exn>>
    | LoadAssemblies of AssemblyPackage list

let rec serverLoop (self : Actor<ThunkMessage>) : Async<unit> =
    async {
        let! msg = self.Receive()
        match msg with
        | RunThunk(thunk, reply) ->
            let result : Choice<obj, exn> =
                try thunk () |> Choice1Of2
                with e -> Choice2Of2 e

            do! reply.Reply result
        | LoadAssemblies assemblies ->
            loadRawAssemblies assemblies

        return! serverLoop self
    }

/// submit a thunk for evaluation to target actor ref
let evaluate (server : ActorRef<ThunkMessage>) (thunk : unit -> 'T) =
    // traverse and upload dependencies
    server <-- LoadAssemblies (gatherDependencies thunk)
    // assembly upload complete, send thunk for execution
    let result = server <!= fun replyChannel -> RunThunk ((fun () -> thunk () :> obj), replyChannel)
    match result with
    | Choice1Of2 o -> o :?> 'T
    | Choice2Of2 e -> raise e

And we are done! We can use the companion project to verify that this indeed resolves the previously failing deployment scenario. Clearly, this should cover the case of assemblies missing from the remote process. But does it work with lambdas defined in F# interactive? Let’s try it out:

> evaluate localServer (fun () -> 1 + 1);;
System.NotSupportedException: The invoked member is not supported in a dynamic assembly.
   at System.Reflection.Emit.InternalAssemblyBuilder.get_Location()
   at ThunkServer.Naive2.aux@36-1.Invoke(KeyValuePair`2 _arg1)

Huh? What’s a dynamic assembly? And why does it appear here? Reading from MSDN:

(…) a set of managed types in the System.Reflection.Emit namespace that allow a compiler or tool to emit metadata and Microsoft intermediate language (MSIL) at run time (…)

Among such compilers is F# interactive, which uses a single dynamic assembly for emitting all code defined in a single session. This can be verified by writing

> (fun i -> i + 1).GetType().Assembly ;;
val it : System.Reflection.Assembly =
  FSI-ASSEMBLY, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
    {IsDynamic = true;
     Location = ?;}

We have reached a dead end, it seems. Clearly, dynamic assemblies cannot be distributed to remote processes. And even if they could, we would face more problems on account of dynamic assemblies expanding over time. What is there to do?

Enter Vagabond

Vagabond is an automated dependency management framework. It is capable of converting dynamic assemblies into standalone, distributable assemblies. This is achieved using Mono.Cecil and a modification of AssemblySaver, a parser for dynamic assemblies.

Basic usage

Having worked on the above examples, the Vagabond API should come off as being natural. We can begin using the library by initialising aVagabond state object:

open Nessos.Vagabond

let vagabond : Vagabond = Vagabond.Initialize(cacheDirectory = "/tmp/vagabond")

This will be our entry point for all interactions with the library. For starters, let us compute the dependencies for a lambda defined in F# interactive:

> let deps = vagabond.ComputeObjectDependencies((fun i -> i + 1), permitCompilation = true) ;;
val deps : System.Reflection.Assembly list =
  [FSI-ASSEMBLY_17a1c27a-5c8f-4acb-ae1a-5aea74027854_1, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]

This will return a single dependent assembly that is non-dynamic, generated by Vagabond. The numerical suffix indicates that this is the first slice produced of that particular dynamic assembly. Vagabond uses a slicing scheme to address the issue of dynamic assemblies that expand over time. To create an exportable assembly package, we simply call

let pkg : AssemblyPackage = vagabond.CreateAssemblyPackage(assembly, includeAssemblyImage = true)

Packages can be loaded to the current application domain using the same object:

let result : AssemblyLoadInfo = vagabond.LoadAssemblyPackage(pkg)

As soon as assembly dependencies have been uploaded to the recipient, communication can be established. Importantly, this must be done using the serialiser instance provided by the Vagabond object.

vagabond.Serializer : FsPicklerSerializer

This makes use of purpose-built FsPickler functionality to bridge between local dynamic assemblies and exported slices.

Putting it all together

Let’s now try to correctly implement our thunk server. We first define our updated actor body:

type ThunkMessage =
    // Evaluate thunk
    | RunThunk of (unit -> obj) * IReplyChannel<Choice<obj, exn>>
    // Query remote process on assembly load state
    | GetAssemblyLoadState of AssemblyId list * IReplyChannel<AssemblyLoadInfo list>
    // Submit assembly packages for loading at remote party
    | UploadAssemblies of AssemblyPackage list * IReplyChannel<AssemblyLoadInfo list>

let rec serverLoop (self : Actor<ThunkMessage>) : Async<unit> =
    async {
        let! msg = self.Receive()
        match msg with
        | RunThunk(thunk, reply) ->
            let result : Choice<obj, exn> =
                try thunk () |> Choice1Of2
                with e -> Choice2Of2 e

            do! reply.Reply result
        | GetAssemblyLoadState(assemblyIds, reply) ->
            // query local vagabond object on load state
            let info = vagabond.GetAssemblyLoadInfo assemblyIds
            do! reply.Reply info

        | UploadAssemblies(pkgs, reply) ->
            // load packages using local vagabond object
            let results = vagabond.LoadAssemblyPackages(pkgs)
            do! reply.Reply results

        return! serverLoop self
    }

/// create a local thunk server instance with given name
let createServer name = Actor.Start name serverLoop |> Actor.ref

It remains to define the client side of the assembly upload logic. To make sure that assemblies are not needlessly uploaded, we could either implement our own upload protocol using the previously described Vagabond API, or we could simply use the built-in protocol and only specify the communication implementation:

/// submit a thunk for evaluation to target actor ref
let evaluate (server : ActorRef<ThunkMessage>) (thunk : unit -> 'T) =
    // receiver implementation ; only specifies how to communicate with remote party
    let receiver =
        {
            new IRemoteAssemblyReceiver with
                member __.GetLoadedAssemblyInfo(ids: AssemblyId list) =
                    server.PostWithReply(fun reply -> GetAssemblyLoadState(ids, reply))

                member __.PushAssemblies (pkgs: AssemblyPackage list) =
                    server.PostWithReply(fun reply -> UploadAssemblies(pkgs, reply))
        }

    // submit assemblies using the receiver implementation and the built-in upload protocol
    vagabond.SubmitObjectDependencies(receiver, thunk, permitCompilation = true)
    |> Async.RunSynchronously
    |> ignore

    // dependency upload complete, send thunk for execution
    let result = server <!= fun replyChannel -> RunThunk ((fun () -> thunk () :> obj), replyChannel)
    match result with
    | Choice1Of2 o -> o :?> 'T
    | Choice2Of2 e -> raise e

Using the companion project we can now test our implementation. The examples below all work in F# interactive.

// spawns a windowed console application that hosts a single thunk server instance
let server : ActorRef<ThunkMessage> = ThunkServer.spawnWindow()

evaluate server (fun () -> 1 + 1)
evaluate server (fun () -> printfn "Remote side-effect")
evaluate server (fun () -> do failwith "boom!")

Deploying actors

An application of particular interest is the ability to remotely deploy actor definitions straight from F# interactive, just by using our simple thunk server:

// deploy an actor body remotely using thunk server
let deployActor name (body : Actor&amp;amp;lt;'T&amp;amp;gt; -&amp;amp;gt; Async&amp;amp;lt;unit&amp;amp;gt;) : ActorRef&amp;amp;lt;'T&amp;amp;gt; =
    evaluate server (fun () -&amp;amp;gt; let actor = Actor.Start name body in actor.Ref)

Let’s try this out by implementing a simple counter actor. All type definitions and logics can be declared straight from F# interactive.

type Counter =
    | IncrementBy of int
    | GetCount of IReplyChannel<int>

let rec body count (self : Actor<Counter>) = async {
    let! msg = self.Receive()
    match msg with
    | IncrementBy i -> return! body (count + i) self
    | GetCount rc ->
        do! rc.Reply count
        return! body count self
}

// deploy to thunk server, receive remote actor ref
let ref : ActorRef<Counter> = deployActor "counter" (body 0)

We can now test the deployment simply by interfacing with the actor ref we have received. Side effects can be added in the actor body to verify that code indeed runs in the remote window.

ref <-- IncrementBy 1
ref <-- IncrementBy 2
ref <-- IncrementBy 3

ref <!= GetCount // 6

Further Applications

Vagabond enables on-the-fly deployment of code, be it to a collocated process, your next-door server or an Azure-hosted cluster. It is applicable not only to F# and its shell, but should work with any .NET language/REPL such as the Roslyn-based scriptcs. The MBrace framework already makes use of the library, enabling instant deployment of distributed algorithms to the cloud. We plan to incorporate Vagabond functionality with Thespian as well. Vagabond is a powerful library from which most distributed frameworks running on .NET could benefit. I hope that today’s exposition will encourage a more widespread adoption.

Deploying .NET code instantly using Vagabond

A declarative argument parser for F#

When it comes to command line argument parsing in the F# projects I work on, my library of choice so far has been the argument parser available with the F# powerpack. While ArgParser is a simple implementation that works well, I always felt that it didn’t offer as declarative an experience as I would like it to have: extracting the parsed results can only be done through the use of side-effects.

It becomes even uglier when you need to combine this with configuration provided from App.Config, resulting in arduous code that (in most cases) adheres to the following pattern:

  1. Parse configuration file for parameter “foo”.
  2. Parse command line arguments for parameter “foo”.
  3. Configuration file is overriden if declared otherwise in command line.

I felt that this configuration parsing scheme is a pattern that can and should be handled transparently. After a bit of experimentation, I ended up with a library of my own, which you can find uploaded in github. What follows is an informal walkthrough of what it does.

The Basic Idea

The library is based on the simple observation that configuration parameters can be naturally described using discriminated unions. For instance:

UnionArgParser takes such discriminated unions and generates a corresponding argument parsing scheme. For example, the parser generated from the above template recognizes the syntax

--working-directory /var/run --listener localhost 8080 --detach

yielding outputs of type Argument list. The syntax is infered from the union type without the need to specify any additional metadata.

The parser will also look for the following keys in the AppSettings section of the application config file:

As mentioned previously, command line arguments will override their corresponding configuration file entries. This default behaviour can be changed, however.

Usage

A minimal example using the above union can be written as follows:

While getting a single list of all parsed results might be useful for some cases, it is more likely that you need to query the results for specific parameters:

Querying using quotations enables a simple and type safe way to deconstruct parse results into their constituent values.

Customization

The parsing behaviour of the configuration parameters can be customized by fixing attributes to the union cases:

In this case,

  • Mandatory: parser will fail if no configuration for this parameter is given.
  • NoCommandLine: restricts this parameter to the AppSettings section.
  • AltCommandLine: specifies an alternative command line switch.

The following attributes are also available:

  • NoAppConfig: restricts to command line.
  • Rest: all remaining command line args are consumed by this parameter.
  • Hidden: do not display in the help text.
  • GatherAllSources: command line does not override AppSettings.
  • ParseCSV: AppSettings entries are given as comma separated values.
  • CustomAppSettings: sets a custom key name for AppSettings.

Post Processing

It should be noted here that arbitrary unions are not supported by the parser. Union cases can only contain fields of certain primitive types, that is int, bool, string and float. This means of course that user-defined parsers are not supported. For configuration inputs that are non-trivial, a post-process facility is provided.

This construct is useful since exception handling is performed by the arg parser itself.

Final Remarks

I have used this library extensively in quite a few of my projects and have so far been satisfied with its results. As always, you are welcome to try it out for yourselves and submit your feedback. I anticipate that some people might complain that this isn’t a very idiomatic implementation, since most of it depends on reflection. Using parser records would have been much more straightforward implementation-wise, but, there is a certain elegance in declaring unions of parameters that simply cannot be ignored🙂

EDIT: A NuGet package has now been uploaded.

A declarative argument parser for F#

Parametric open recursion, Pt. 2

In my previous post, I demonstrated how one can define poly-variadic fixpoint combinators in a strict language like F# using references. Today, we are going to expand on these ideas to construct a fixpoint combinator that allows open recursion over an indexed family of functions:

In this implementation, recursive bindings are made possible by early memoization of function references. This implies that an equality semantics is required for the domain of parameters. Also, just as in the poly-variadic case, the evaluation of references is delayed through the use of eta expansion.

Example: compiling regular expressions

Consider the following regular expression language:

Suppose I need to define a function Regex<'T> -> 'T [] -> bool which, given a regular expression, precomputes its recognizing predicate. There are many ways to write this, but the parametric combinator provides a particularly elegant solution:

Needless to say, the above will not work with the traditional Y combinator.

Extending the combinator

In the previous entry I described how the poly-variadic fixpoint can be extended beyond function spaces to any type that supports the notion of “delayed dereferencing”. The same pattern is observed with the parametric fixpoint. In fact, this construct can be generalized to a point where its type signature becomes essentially the same as that of the traditional Y combinator:

A similar version to the above can be found in the implementation of FsCoreSerializer.

Parametric open recursion, Pt. 2

Parametric open recursion, Pt. 1

Consider the following situation: I need to define a function Param -> a -> b so that given a parameter p : Param my algorithm precomputes a function a -> b. Emphasis here goes to precomputation, which you can assume is an expensive operation. There are a few examples of such a pattern, this could be a regular expression parser (or indeed any parser), a denotational semantics, or, as it happens in my case of interest, a function that takes a type and returns a serializer for that type.

All this might sound a bit trivial, but what happens if I need to add recursion into the mix? What if I needed to define the Kleene star in terms of itself or if I needed to define serialization rules for (mutual) recursive types? One would be inclined to point out that this is easily solvable by applying the traditional fixpoint combinator, but this does not play well with our original assumption: the so-called precomputation stage will have to be performed every time a recursive call is made, which renders it neither efficient nor a precomputation.

Poly-variadic fix-point combinators

It is a well-known fact that mutual recursion can be encoded into the traditional Y combinator. This idea is reflected in certain constructs that allow the definition of mutual recursive functions without utilizing explicit language support for such. These are known as poly-variadic fixpoint combinators. An interesting example lies with Haskell, whose non-strict semantics permit a very succinct implementation:

How do you define the same combinator in a strict language like F#? There a few ways you could go about doing that, like wrapping around lazy types or fixing the arity of your defined functions, but it turns out you can preserve the original type signature with a bit of cheating:

This really works, and you can try it out for yourself at F# snippets. In fact, as my colleague Nick Palladinos has pointed out, it is significantly more efficient than your standard Y combinator (even though it remains orders of magnitude slower than idiomatic tail recursion).

A key observation to be made about this implementation is the use of “eta expansion” over the function references. The resulting functions have the property of preserving the content of the enclosed ref cell at the time of invocation. I would call this transformation “delayed dereferencing”, but I leave it to the more PLT inclined to suggest a better name for this. It could be further noted that similar combinators are possible in other domains that support this notion of delay, such as tuples of functions or even lazy types.

In my next entry I will be describing how the above ideas can be applied to solve the parametric recursion problem in F#.

Parametric open recursion, Pt. 1

Serializing F# types

One of the many performance challenges we have faced during our development of {m}brace is that of serialization. As you might have heard[1,2,3] {m}brace is a declarative cloud programming framework -currently under development- that aspires to become the “next big thing” in big data analytics.

Serialization performance of F types with the available .NET formatters is not exactly ideal, and this problem is amplified in our setting where transmission of arbitrary user-defined objects is commonplace. Some cases like trees are simply too slow, whereas others such as huge lists are outright fatal, leading to potential stack overflows.

Our solution to this problem is an implementation that at its core combines two ideas already out there:

  1. The excellent serializer by Anton Tayanovskyy (http://fssnip.net/6u) that compositionally constructs its formatter functions by recursively traversing the reflected types of input objects.
  2. The FsReflect library by Kurt Schelfthout (https://bitbucket.org/kurt/fsreflect) that uses dynamic methods to make the F# reflection API blazing fast.

The combination of the two yields a remarkably fast serializer when it comes to large tree structures and quotations. I have observed performance of up to 10-20x faster than BinaryFormatter or NetDataContractSerializer. Additionally, this implementation supports object caching, fallback serialization for non F# types, dynamic resolution of nested untyped fields and declaration of custom formatters. All round, this works as a general-purpose binary serializer.

I have named it quite unoriginally FsCoreSerializer and the source code has been uploaded to github. As always, your remarks and feedback would be very much appreciated.

EDIT: A nuget package is now available.

Serializing F# types