Deploying .NET code instantly using Vagabond

 

This is post #30 of the English 2014 F# advent calendar. Thanks to Sergey Tihon for inviting me and suggesting the topic. Be sure to check out all the other awesome posts! In this post I will be describing Vagabond, a dependency management library of mine. I will try to walk through some of the subtleties in the .NET framework that drove the library implementation, in an attempt to make it more accessible. All code and examples presented can be found in this repository.

Prelude

It is often claimed in functional programming that functions are treated as values. This is a valid assumption in simple applications, permitting powerful patterns when writing code. But is this a general truth, or simply a linguistic abstraction? If functions really are values, one would expect that they exhibit all properties normally expected of values. For instance, it should be possible to serialise functions and transmit to a remote process for use. But wait, does serialising a function even make sense? In principle it should, we know for a fact that all code is essentially data. But the answer really depends on language/runtime support, serialization facilities and how lambdas are represented. Our goal here is to consider the .NET framework and F# in particular. How does F#/.NET represent lambdas? We’ll attempt to illustrate by example.

‘Thunk Server’

Our goal is to define a ‘thunk server’, an actor implementation that receives arbitrary thunks (lambdas of type unit -> 'T) which it executes replying with the result. For this exercise we will be using Thespian, a distributed actor framework for F#. The choice of actor framework is not particularly important; Thespian should be easy to grasp however, since its programming model closely follows that of the F# MailboxProcessor. We start off with the simplest imaginable conception of how a thunk server should be implemented:

open Nessos.Thespian

// Actor API: a thunk and reply channel in which to send the result
type ThunkMessage = (unit -> obj) * IReplyChannel<Choice<obj, exn>>

// actor body definition
let rec serverLoop (self : Actor<ThunkMessage>) : Async<unit> =
    async {
        // receive next message
        let! thunk, reply = self.Receive()
        // execute, catching any exception
        let result : Choice<obj, exn> =
            try thunk () |> Choice1Of2
            with e -> Choice2Of2 e
        // reply
        do! reply.Reply result
        return! serverLoop self
    }

// initialize an actor instance in the current process
let server : Actor<ThunkMessage> = Actor.Start "thunkServer" serverLoop

We can now interface with our thunk server like so:

/// submit a thunk for evaluation to target actor ref
let evaluate (server : ActorRef<ThunkMessage>) (thunk : unit -> 'T) : 'T =
    // post to actor and wait for reply
    let result = server <!= fun replyChannel -> (fun () -> thunk () :> obj), replyChannel
    // downcast if value, raise if exception
    match result with
    | Choice1Of2 o -> o :?> 'T
    | Choice2Of2 e -> raise e

evaluate server.Ref (fun () -> 1 + 1) // returns 2

But is this implementation correct? Remember, our goal is to submit lambdas to a remote process for execution. Using the companion project we can test precisely this scenario:

// spawn a local windowed process that runs the actor
let server : ActorRef<ThunkMessage> = spawnWindow ()

Let’s attempt to submit some code to the server:

// load a third party library and submit code for evaluation
#r "ThirdPartyLibrary.dll"
evaluate server ThirdPartyLibrary.thirdPartyLambda

Evaluation fails with the following error:

System.IO.FileNotFoundException: Could not load file or assembly 'ThirdPartyLibrary, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.
   at System.Reflection.RuntimeAssembly.GetType(RuntimeAssembly assembly, String name, Boolean throwOnError, Boolean ignoreCase, ObjectHandleOnStack type)
   at System.Reflection.RuntimeAssembly.GetType(String name, Boolean throwOnError, Boolean ignoreCase)
   at Nessos.FsPickler.ReflectionCache.loadMemberInfo(FSharpOption`1 tyConv, FSharpFunc`2 loadAssembly, FSharpFunc`2 getMethodSignature, CompositeMemberInfo mI)

Why does this happen? Because lambdas in F# are not exactly values. Rather, they are instances of compiler generated classes that inherit the abstract type FSharpFunc. Needless to say, a lambda cannot be successfully deserialised in a process unless its particular underlying class is loaded in the current application domain.

Fixing the problem

So how can we fix the problem? A popular approach is to distribute code using expression trees. Expression trees are clearly values, hence can be easily serialised and evaluated in the target machine. However, this is not a satisfactory solution: expression trees do not support type definitions; evaluation of expression trees is not always efficient; expression trees can still reference code in out-of-reach assemblies. We need to to better than this! Our experience illustrates that a proper serialisation format for lambdas must also include their dependent assemblies. So let’s try to do precisely that:

open System.IO
open System.Reflection

// Raw assembly container
type AssemblyPackage =
    {
        FullName : string
        AssemblyImage : byte []
    }

// transitively traverse assembly dependencies for given object
let gatherDependencies (object:obj) : AssemblyPackage list =
    let rec aux (gathered : Map<string, Assembly>) (remaining : Assembly list) =
        match remaining with
        // ignored assembly
        | a :: rest when gathered.ContainsKey a.FullName || a.GlobalAssemblyCache -> aux gathered rest
        // came across new assembly, add to state and include transitive dependencies to inputs
        | a :: rest ->
            let dependencies = a.GetReferencedAssemblies() |> Seq.map Assembly.Load |> Seq.toList
            aux (Map.add a.FullName a gathered) (dependencies @ rest)
        // traversal complete, create assembly packages
        | [] ->
            gathered
            |> Seq.map (fun (KeyValue(_,a)) ->
                                { FullName = a.FullName
                                  AssemblyImage = File.ReadAllBytes a.Location })
            |> Seq.toList

    aux Map.empty [object.GetType().Assembly]

// loads raw assemblies in current application domain
let loadRawAssemblies (pkgs : AssemblyPackage list) =
    pkgs |> List.iter (fun pkg -> Assembly.Load pkg.AssemblyImage |> ignore)

The code above uses reflection to traverse the assembly dependencies for a given object. The raw binary assembly images are then read from disk to be submitted and loaded by the remote recipient. We can now revise our thunk server implementation as follows:

// our actor API is augmented with an additional message
type ThunkMessage =
    | RunThunk of (unit -> obj) * IReplyChannel<Choice<obj, exn>>
    | LoadAssemblies of AssemblyPackage list

let rec serverLoop (self : Actor<ThunkMessage>) : Async<unit> =
    async {
        let! msg = self.Receive()
        match msg with
        | RunThunk(thunk, reply) ->
            let result : Choice<obj, exn> =
                try thunk () |> Choice1Of2
                with e -> Choice2Of2 e

            do! reply.Reply result
        | LoadAssemblies assemblies ->
            loadRawAssemblies assemblies

        return! serverLoop self
    }

/// submit a thunk for evaluation to target actor ref
let evaluate (server : ActorRef<ThunkMessage>) (thunk : unit -> 'T) =
    // traverse and upload dependencies
    server <-- LoadAssemblies (gatherDependencies thunk)
    // assembly upload complete, send thunk for execution
    let result = server <!= fun replyChannel -> RunThunk ((fun () -> thunk () :> obj), replyChannel)
    match result with
    | Choice1Of2 o -> o :?> 'T
    | Choice2Of2 e -> raise e

And we are done! We can use the companion project to verify that this indeed resolves the previously failing deployment scenario. Clearly, this should cover the case of assemblies missing from the remote process. But does it work with lambdas defined in F# interactive? Let’s try it out:

> evaluate localServer (fun () -> 1 + 1);;
System.NotSupportedException: The invoked member is not supported in a dynamic assembly.
   at System.Reflection.Emit.InternalAssemblyBuilder.get_Location()
   at ThunkServer.Naive2.aux@36-1.Invoke(KeyValuePair`2 _arg1)

Huh? What’s a dynamic assembly? And why does it appear here? Reading from MSDN:

(…) a set of managed types in the System.Reflection.Emit namespace that allow a compiler or tool to emit metadata and Microsoft intermediate language (MSIL) at run time (…)

Among such compilers is F# interactive, which uses a single dynamic assembly for emitting all code defined in a single session. This can be verified by writing

> (fun i -> i + 1).GetType().Assembly ;;
val it : System.Reflection.Assembly =
  FSI-ASSEMBLY, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
    {IsDynamic = true;
     Location = ?;}

We have reached a dead end, it seems. Clearly, dynamic assemblies cannot be distributed to remote processes. And even if they could, we would face more problems on account of dynamic assemblies expanding over time. What is there to do?

Enter Vagabond

Vagabond is an automated dependency management framework. It is capable of converting dynamic assemblies into standalone, distributable assemblies. This is achieved using Mono.Cecil and a modification of AssemblySaver, a parser for dynamic assemblies.

Basic usage

Having worked on the above examples, the Vagabond API should come off as being natural. We can begin using the library by initialising aVagabond state object:

open Nessos.Vagabond

let vagabond : Vagabond = Vagabond.Initialize(cacheDirectory = "/tmp/vagabond")

This will be our entry point for all interactions with the library. For starters, let us compute the dependencies for a lambda defined in F# interactive:

> let deps = vagabond.ComputeObjectDependencies((fun i -> i + 1), permitCompilation = true) ;;
val deps : System.Reflection.Assembly list =
  [FSI-ASSEMBLY_17a1c27a-5c8f-4acb-ae1a-5aea74027854_1, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]

This will return a single dependent assembly that is non-dynamic, generated by Vagabond. The numerical suffix indicates that this is the first slice produced of that particular dynamic assembly. Vagabond uses a slicing scheme to address the issue of dynamic assemblies that expand over time. To create an exportable assembly package, we simply call

let pkg : AssemblyPackage = vagabond.CreateAssemblyPackage(assembly, includeAssemblyImage = true)

Packages can be loaded to the current application domain using the same object:

let result : AssemblyLoadInfo = vagabond.LoadAssemblyPackage(pkg)

As soon as assembly dependencies have been uploaded to the recipient, communication can be established. Importantly, this must be done using the serialiser instance provided by the Vagabond object.

vagabond.Serializer : FsPicklerSerializer

This makes use of purpose-built FsPickler functionality to bridge between local dynamic assemblies and exported slices.

Putting it all together

Let’s now try to correctly implement our thunk server. We first define our updated actor body:

type ThunkMessage =
    // Evaluate thunk
    | RunThunk of (unit -> obj) * IReplyChannel<Choice<obj, exn>>
    // Query remote process on assembly load state
    | GetAssemblyLoadState of AssemblyId list * IReplyChannel<AssemblyLoadInfo list>
    // Submit assembly packages for loading at remote party
    | UploadAssemblies of AssemblyPackage list * IReplyChannel<AssemblyLoadInfo list>

let rec serverLoop (self : Actor<ThunkMessage>) : Async<unit> =
    async {
        let! msg = self.Receive()
        match msg with
        | RunThunk(thunk, reply) ->
            let result : Choice<obj, exn> =
                try thunk () |> Choice1Of2
                with e -> Choice2Of2 e

            do! reply.Reply result
        | GetAssemblyLoadState(assemblyIds, reply) ->
            // query local vagabond object on load state
            let info = vagabond.GetAssemblyLoadInfo assemblyIds
            do! reply.Reply info

        | UploadAssemblies(pkgs, reply) ->
            // load packages using local vagabond object
            let results = vagabond.LoadAssemblyPackages(pkgs)
            do! reply.Reply results

        return! serverLoop self
    }

/// create a local thunk server instance with given name
let createServer name = Actor.Start name serverLoop |> Actor.ref

It remains to define the client side of the assembly upload logic. To make sure that assemblies are not needlessly uploaded, we could either implement our own upload protocol using the previously described Vagabond API, or we could simply use the built-in protocol and only specify the communication implementation:

/// submit a thunk for evaluation to target actor ref
let evaluate (server : ActorRef<ThunkMessage>) (thunk : unit -> 'T) =
    // receiver implementation ; only specifies how to communicate with remote party
    let receiver =
        {
            new IRemoteAssemblyReceiver with
                member __.GetLoadedAssemblyInfo(ids: AssemblyId list) =
                    server.PostWithReply(fun reply -> GetAssemblyLoadState(ids, reply))

                member __.PushAssemblies (pkgs: AssemblyPackage list) =
                    server.PostWithReply(fun reply -> UploadAssemblies(pkgs, reply))
        }

    // submit assemblies using the receiver implementation and the built-in upload protocol
    vagabond.SubmitObjectDependencies(receiver, thunk, permitCompilation = true)
    |> Async.RunSynchronously
    |> ignore

    // dependency upload complete, send thunk for execution
    let result = server <!= fun replyChannel -> RunThunk ((fun () -> thunk () :> obj), replyChannel)
    match result with
    | Choice1Of2 o -> o :?> 'T
    | Choice2Of2 e -> raise e

Using the companion project we can now test our implementation. The examples below all work in F# interactive.

// spawns a windowed console application that hosts a single thunk server instance
let server : ActorRef<ThunkMessage> = ThunkServer.spawnWindow()

evaluate server (fun () -> 1 + 1)
evaluate server (fun () -> printfn "Remote side-effect")
evaluate server (fun () -> do failwith "boom!")

Deploying actors

An application of particular interest is the ability to remotely deploy actor definitions straight from F# interactive, just by using our simple thunk server:

// deploy an actor body remotely using thunk server
let deployActor name (body : Actor&amp;amp;lt;'T&amp;amp;gt; -&amp;amp;gt; Async&amp;amp;lt;unit&amp;amp;gt;) : ActorRef&amp;amp;lt;'T&amp;amp;gt; =
    evaluate server (fun () -&amp;amp;gt; let actor = Actor.Start name body in actor.Ref)

Let’s try this out by implementing a simple counter actor. All type definitions and logics can be declared straight from F# interactive.

type Counter =
    | IncrementBy of int
    | GetCount of IReplyChannel<int>

let rec body count (self : Actor<Counter>) = async {
    let! msg = self.Receive()
    match msg with
    | IncrementBy i -> return! body (count + i) self
    | GetCount rc ->
        do! rc.Reply count
        return! body count self
}

// deploy to thunk server, receive remote actor ref
let ref : ActorRef<Counter> = deployActor "counter" (body 0)

We can now test the deployment simply by interfacing with the actor ref we have received. Side effects can be added in the actor body to verify that code indeed runs in the remote window.

ref <-- IncrementBy 1
ref <-- IncrementBy 2
ref <-- IncrementBy 3

ref <!= GetCount // 6

Further Applications

Vagabond enables on-the-fly deployment of code, be it to a collocated process, your next-door server or an Azure-hosted cluster. It is applicable not only to F# and its shell, but should work with any .NET language/REPL such as the Roslyn-based scriptcs. The MBrace framework already makes use of the library, enabling instant deployment of distributed algorithms to the cloud. We plan to incorporate Vagabond functionality with Thespian as well. Vagabond is a powerful library from which most distributed frameworks running on .NET could benefit. I hope that today’s exposition will encourage a more widespread adoption.

Deploying .NET code instantly using Vagabond

A declarative argument parser for F#

When it comes to command line argument parsing in the F# projects I work on, my library of choice so far has been the argument parser available with the F# powerpack. While ArgParser is a simple implementation that works well, I always felt that it didn’t offer as declarative an experience as I would like it to have: extracting the parsed results can only be done through the use of side-effects.

It becomes even uglier when you need to combine this with configuration provided from App.Config, resulting in arduous code that (in most cases) adheres to the following pattern:

  1. Parse configuration file for parameter “foo”.
  2. Parse command line arguments for parameter “foo”.
  3. Configuration file is overriden if declared otherwise in command line.

I felt that this configuration parsing scheme is a pattern that can and should be handled transparently. After a bit of experimentation, I ended up with a library of my own, which you can find uploaded in github. What follows is an informal walkthrough of what it does.

The Basic Idea

The library is based on the simple observation that configuration parameters can be naturally described using discriminated unions. For instance:

UnionArgParser takes such discriminated unions and generates a corresponding argument parsing scheme. For example, the parser generated from the above template recognizes the syntax

--working-directory /var/run --listener localhost 8080 --detach

yielding outputs of type Argument list. The syntax is infered from the union type without the need to specify any additional metadata.

The parser will also look for the following keys in the AppSettings section of the application config file:

As mentioned previously, command line arguments will override their corresponding configuration file entries. This default behaviour can be changed, however.

Usage

A minimal example using the above union can be written as follows:

While getting a single list of all parsed results might be useful for some cases, it is more likely that you need to query the results for specific parameters:

Querying using quotations enables a simple and type safe way to deconstruct parse results into their constituent values.

Customization

The parsing behaviour of the configuration parameters can be customized by fixing attributes to the union cases:

In this case,

  • Mandatory: parser will fail if no configuration for this parameter is given.
  • NoCommandLine: restricts this parameter to the AppSettings section.
  • AltCommandLine: specifies an alternative command line switch.

The following attributes are also available:

  • NoAppConfig: restricts to command line.
  • Rest: all remaining command line args are consumed by this parameter.
  • Hidden: do not display in the help text.
  • GatherAllSources: command line does not override AppSettings.
  • ParseCSV: AppSettings entries are given as comma separated values.
  • CustomAppSettings: sets a custom key name for AppSettings.

Post Processing

It should be noted here that arbitrary unions are not supported by the parser. Union cases can only contain fields of certain primitive types, that is int, bool, string and float. This means of course that user-defined parsers are not supported. For configuration inputs that are non-trivial, a post-process facility is provided.

This construct is useful since exception handling is performed by the arg parser itself.

Final Remarks

I have used this library extensively in quite a few of my projects and have so far been satisfied with its results. As always, you are welcome to try it out for yourselves and submit your feedback. I anticipate that some people might complain that this isn’t a very idiomatic implementation, since most of it depends on reflection. Using parser records would have been much more straightforward implementation-wise, but, there is a certain elegance in declaring unions of parameters that simply cannot be ignored :-)

EDIT: A NuGet package has now been uploaded.

A declarative argument parser for F#

Parametric open recursion, Pt. 2

In my previous post, I demonstrated how one can define poly-variadic fixpoint combinators in a strict language like F# using references. Today, we are going to expand on these ideas to construct a fixpoint combinator that allows open recursion over an indexed family of functions:

In this implementation, recursive bindings are made possible by early memoization of function references. This implies that an equality semantics is required for the domain of parameters. Also, just as in the poly-variadic case, the evaluation of references is delayed through the use of eta expansion.

Example: compiling regular expressions

Consider the following regular expression language:

Suppose I need to define a function Regex<'T> -> 'T [] -> bool which, given a regular expression, precomputes its recognizing predicate. There are many ways to write this, but the parametric combinator provides a particularly elegant solution:

Needless to say, the above will not work with the traditional Y combinator.

Extending the combinator

In the previous entry I described how the poly-variadic fixpoint can be extended beyond function spaces to any type that supports the notion of “delayed dereferencing”. The same pattern is observed with the parametric fixpoint. In fact, this construct can be generalized to a point where its type signature becomes essentially the same as that of the traditional Y combinator:

A similar version to the above can be found in the implementation of FsCoreSerializer.

Parametric open recursion, Pt. 2

Parametric open recursion, Pt. 1

Consider the following situation: I need to define a function Param -> a -> b so that given a parameter p : Param my algorithm precomputes a function a -> b. Emphasis here goes to precomputation, which you can assume is an expensive operation. There are a few examples of such a pattern, this could be a regular expression parser (or indeed any parser), a denotational semantics, or, as it happens in my case of interest, a function that takes a type and returns a serializer for that type.

All this might sound a bit trivial, but what happens if I need to add recursion into the mix? What if I needed to define the Kleene star in terms of itself or if I needed to define serialization rules for (mutual) recursive types? One would be inclined to point out that this is easily solvable by applying the traditional fixpoint combinator, but this does not play well with our original assumption: the so-called precomputation stage will have to be performed every time a recursive call is made, which renders it neither efficient nor a precomputation.

Poly-variadic fix-point combinators

It is a well-known fact that mutual recursion can be encoded into the traditional Y combinator. This idea is reflected in certain constructs that allow the definition of mutual recursive functions without utilizing explicit language support for such. These are known as poly-variadic fixpoint combinators. An interesting example lies with Haskell, whose non-strict semantics permit a very succinct implementation:

How do you define the same combinator in a strict language like F#? There a few ways you could go about doing that, like wrapping around lazy types or fixing the arity of your defined functions, but it turns out you can preserve the original type signature with a bit of cheating:

This really works, and you can try it out for yourself at F# snippets. In fact, as my colleague Nick Palladinos has pointed out, it is significantly more efficient than your standard Y combinator (even though it remains orders of magnitude slower than idiomatic tail recursion).

A key observation to be made about this implementation is the use of “eta expansion” over the function references. The resulting functions have the property of preserving the content of the enclosed ref cell at the time of invocation. I would call this transformation “delayed dereferencing”, but I leave it to the more PLT inclined to suggest a better name for this. It could be further noted that similar combinators are possible in other domains that support this notion of delay, such as tuples of functions or even lazy types.

In my next entry I will be describing how the above ideas can be applied to solve the parametric recursion problem in F#.

Parametric open recursion, Pt. 1

Serializing F# types

One of the many performance challenges we have faced during our development of {m}brace is that of serialization. As you might have heard[1,2,3] {m}brace is a declarative cloud programming framework -currently under development- that aspires to become the “next big thing” in big data analytics.

Serialization performance of F types with the available .NET formatters is not exactly ideal, and this problem is amplified in our setting where transmission of arbitrary user-defined objects is commonplace. Some cases like trees are simply too slow, whereas others such as huge lists are outright fatal, leading to potential stack overflows.

Our solution to this problem is an implementation that at its core combines two ideas already out there:

  1. The excellent serializer by Anton Tayanovskyy (http://fssnip.net/6u) that compositionally constructs its formatter functions by recursively traversing the reflected types of input objects.
  2. The FsReflect library by Kurt Schelfthout (https://bitbucket.org/kurt/fsreflect) that uses dynamic methods to make the F# reflection API blazing fast.

The combination of the two yields a remarkably fast serializer when it comes to large tree structures and quotations. I have observed performance of up to 10-20x faster than BinaryFormatter or NetDataContractSerializer. Additionally, this implementation supports object caching, fallback serialization for non F# types, dynamic resolution of nested untyped fields and declaration of custom formatters. All round, this works as a general-purpose binary serializer.

I have named it quite unoriginally FsCoreSerializer and the source code has been uploaded to github. As always, your remarks and feedback would be very much appreciated.

EDIT: A nuget package is now available.

Serializing F# types