Bugfree.dk – Ronnie Holm's blog

Not anti-anything, just pro-quality

Using a generic command-line runner for utility tasks

Posted by Ronnie Holm on 23rd November 2011

Most enterprise projects have one or more console applications for utility tasks such as cleaning up or importing data into the database. These utilities tend to be project-specific and small in terms of code size, and instead of several smaller assemblies, it makes sense to combine these into a single assembly. The generic runner would read the utility, called the command, and arguments from the command-line and use the command pattern to create and execute it.

For the generic runner to work, each command has to fulfill the contract.

public enum ExitCode {
    Success = 0,
    Failure
};

public interface ICommand {
    string Usage { get; }
    string Description { get; }
    ExitCode Execute(string[] args);
}

I want the runner to adhere to the open/closed principle. For its behavior to be modified without altering its core delegation logic. This requires the use of reflection to retrieve and instantiate a command based on command-line arguments.

class Program {
    static IEnumerable<ICommand> GetCommands() {
        var iCommand = typeof (ICommand);
        return System.Reflection.Assembly.GetExecutingAssembly().GetTypes().ToList()
            .Where(t => iCommand.IsAssignableFrom(t) && t != iCommand)
            .Select(t => Activator.CreateInstance(t) as ICommand);
    }

    static void DisplayHelp() {
        Console.WriteLine("Console [Command] [Arg1] [Arg2] [ArgN]\n\n");
        GetCommands().ToList().ForEach(command =>
            Console.WriteLine(command.Usage + "\n" + command.Description + "\n\n"));
    }

    static int Main(string[] args) {
        if (args.Length == 0) {
            DisplayHelp();
            return (int)ExitCode.Failure;
        }

        var commandName = args[0];
        var command = GetCommands().Where(t => t.GetType().Name == commandName).SingleOrDefault();
        if (command == null)
            throw new ArgumentException(string.Format("Command '{0}' not found", commandName));

        var executeArguments = new List<string>(args);
        executeArguments.RemoveAt(0);

        var exitCode = command.Execute(executeArguments.ToArray());
        return (int)exitCode;
    }
}

A trivial example of a command that adds two numbers would be the following:

// $> GenericRunner.exe Calculator 2 3 => 2 + 3 = 5
public class Calculator : ICommand {
    public string Usage {
        get { return "Calculator [Op1] [Op2]"; }
    }

    public string Description {
        get { return "World's simplest calculator"; }
    }

    public ExitCode Execute(string[] args) {
        try {
            Console.WriteLine(
                string.Format(
                    "{0} + {1} = {2}",
                    args[0], args[1], int.Parse(args[0]) + int.Parse(args[1])));
            return ExitCode.Success;
        } catch (Exception e) {
            Console.WriteLine(e.ToString());
            return ExitCode.Failure;
        }
    }
}

Now multiple smaller assemblies can be grouped into one, with a description of all commands automatically being assembled, and without commands interfering (too much) with each other.

Share

Tags: , ,
Posted in .Net, SharePoint | Comments Off

Adding event receivers to SharePoint lists on the fly

Posted by Ronnie Holm on 19th November 2011

In versioning attachments in a SharePoint list using snapshotting, an event receiver was responsible for the heavy lifting. To enable versioning of a list, I could therefore have associated the receiver with a list by adding the usual registration XML to a feature. But versioning is a truly reusable building block that shouldn’t be restricted to lists that are known when the feature is created. A better solution would be to extend the SharePoint list settings page for all lists on a site on which the versioning feature is enabled. The user may then activate or deactivate attachment versioning on the fly.

This would involve adding or removing event receivers from a list as the user enables or disables versioning. The following extension method is one way to accomplish the addition-part in a type-safe manner:

// definition
public static class SPListExtensions {
    public static void RegisterEventReceiver<TReceiver>(this SPList list,
            SPEventReceiverType receiverType,
            int sequenceNumber) where TReceiver : SPItemEventReceiver {
        var assemblyName = typeof(TReceiver).Assembly.FullName;
        var className = typeof(TReceiver).FullName;

        (from SPEventReceiverDefinition definition in list.EventReceivers
         where definition.Assembly == assemblyName &&
               definition.Class == className &&
               definition.Type == receiverType
         select list.EventReceivers[definition.Id])
        .ToList()
        .ForEach(receiverToDelete => receiverToDelete.Delete());

        var receiver = list.EventReceivers.Add();
        receiver.Type = receiverType;
        receiver.Assembly = assemblyName;
        receiver.Class = className;
        receiver.SequenceNumber = sequenceNumber;
        receiver.Update();
        list.Update();
    }
}

// use
list.RegisterEventReceiver<ListAttachmentVersioningEventReceiver>(
    SPEventReceiverType.ItemAdded, 10000);
list.RegisterEventReceiver<ListAttachmentVersioningEventReceiver>(
    SPEventReceiverType.ItemUpdated, 10001);

Under rare circumstances the (assembly, class, type) tuple may not be unique, i.e., the same receiver may be registered multiple times, albeit with different sequence numbers. In practice I never found any use for this functionality, though, which is why I didn’t include the sequence number in the where clause above, causing all registrations matching the tuple to be removed.

Share

Tags: , ,
Posted in .Net, SharePoint | Comments Off

Versioning attachments in a SharePoint list using snapshotting

Posted by Ronnie Holm on 17th November 2011

(See also the F# implementation and adding event receivers to a list on the fly.)

Both SharePoint 2007 and 2010 support versioning for list items but not their attachments. No matter which version of a list item I look at, its attachments will always be the most recent. The attachment support seems to have been bolded on as an afterthought, resulting in behavior that’s counter-intuitive for developers as well as end-users. With SharePoint 2007 (and 2010), Microsoft suggests using a document library for proper attachment versioning. But I can’t substitute one with the other, since a list item may hold any number of attachments and an item in a document library may hold just one.

Existing solution

I counted on someone else having experienced a similar pain and come up with a workable cure. But except for Tim Ebenezer the search come up empty. Tim on the other hand has done a great job of seamlessly integrating his attachment versioning feature into SharePoint. When I activate the feature on a site, it adds a versioning menu item to the list settings page for every list on the site. Unfortunately the core versioning logic, storing attachments in a shadow library using an event receiver, isn’t particularly robust. Among other use cases, it doesn’t properly deal with a user first deleting an attachment and then, some versions later, adding an attachment with the same name.

I therefore set out to implement my own solution based on Tim’s ideas, hooking into the synchronous ItemAdding, ItemUpdating, ItemAttachmentAdding, and ItemAttachmentDeleting events and maintaining a shadow library of versions. This approach, however, quickly turned into a painful one. When the synchronous events run, nothing has yet been written to the database – at this stage a new item doesn’t even have its Id set, and merely determining the number of attachments added and how far I’ve come with the processing is tricky.

The next challenge I encountered was that event handlers cannot easily share state across multiple calls because SharePoint creates a new instance of the receiver class for every event handled. Processing multiple attachments require a counter into the array of attachments to keep track of which ones I’d copied to the shadow list. I’d have to resort to some outside-object storage, keeping in mind that the receiver might execute concurrently. But which storage should I use? Session state may have been disabled, and polluting one of the property bags stored in the content database is messy and also not thread-safe.

Overall, with the synchronous approach too much work has to go into tracking the state of the versioning process.

New solution

A synchronous solution is very hard to get right because it’s forced to work at the level of individual attachments. SharePoint doesn’t have a synchronous event that fires after all attachments have been processed. After all, why provide such an event when everything has already happened? Thinking instead in terms of the asynchronous events of ItemUpdated and ItemAdded, I have exactly what’s needed to snapshot all attachments in one batch, making versioning a lot simpler. When these events fire the item and its attachments have already been written to the database and I can focus on how to generate the snapshots — copying attachments back and forth between lists — and not worry about what the user actual did to the attachments from one version to the next.

// Prerequisites:
// 1. Create a Document Library named ShadowLibrary on the same site as the list to version
// 2. Add a row named CustomVersion of type string to the list to version
public class ListAttachmentVersioningEventReceiver : SPItemEventReceiver {
    private const string CustomVersion = "CustomVersion";
    private const string ShadowLibrary = "ShadowLibrary";

    public override void ItemAdded(SPItemEventProperties properties) {
        base.ItemUpdated(properties);
        SetCustomVersionLabel(properties.ListItem);
        CreateSnapshot(properties);
    }

    public override void ItemUpdated(SPItemEventProperties properties) {
        base.ItemUpdated(properties);

        var item = properties.ListItem;
        if (RollbackHappened(item)) {
            RestoreSnapshot(properties);
            SetCustomVersionLabel(item);
            CreateSnapshot(properties);
        }
        else {
            CreateSnapshot(properties);
            SetCustomVersionLabel(item);
        }
    }

    private void CreateSnapshot(SPItemEventProperties properties) {
        using (var site = properties.OpenWeb()) {
            var item = properties.ListItem;
            var shadowLibrary = site.Lists[ShadowLibrary] as SPDocumentLibrary;
            var path = string.Format("Versions/{0}/{1}", item.ID, GetOfficialVersionLabel(item));
            var shadowFolder = CreateFolderPath(shadowLibrary, path);

            foreach (string fileName in item.Attachments) {
                SPFile existingFile = item.ParentList.ParentWeb.GetFile(item.Attachments.UrlPrefix + fileName);
                SPFile newFile = shadowFolder.Files.Add(fileName, existingFile.OpenBinaryStream());
                newFile.Item.Update();
            }
        }
    }

    private bool RollbackHappened(SPListItem item) {
        var culture = CultureInfo.InvariantCulture;
        var currentVersion = float.Parse(GetOfficialVersionLabel(item), culture);
        var lastVersion = float.Parse(GetCustomVersionLabel(item), culture);
        return currentVersion > lastVersion + 1;
    }

    private void RestoreSnapshot(SPItemEventProperties properties) {
        var item = properties.ListItem;
        var restoreVersion = GetCustomVersionLabel(item);
        EventFiringEnabled = false;

        item.Attachments.Cast<string>().ToList().ForEach(attachment => item.Attachments.Delete(attachment));
        using (var site = properties.OpenWeb()) {
            var path = string.Format("Versions/{0}/{1}", item.ID, restoreVersion);
            var shadowLibrary = site.Lists[ShadowLibrary] as SPDocumentLibrary;
            var source = CreateFolderPath(shadowLibrary, path);

            foreach (SPFile file in source.Files)
                item.Attachments.Add(file.Name, file.OpenBinary());
        }

        item.SystemUpdate(false);
        EventFiringEnabled = true;
    }

    // can only get folder creation to work with Document Libraries
    private SPFolder CreateFolderPath(SPDocumentLibrary list, string path) {
        return CreateFolderPathRecursive(list.RootFolder, path.Split('/').ToList());
    }

    private SPFolder CreateFolderPathRecursive(SPFolder folder, IList<string> pathComponents) {
        if (pathComponents.Count == 0)
            return folder;

        SPFolder newFolder;
        try {
            newFolder = folder.SubFolders[pathComponents.First()];
        }
        catch (ArgumentException) {
            newFolder = folder.SubFolders.Add(pathComponents.First());
        }

        pathComponents.RemoveAt(0);
        return CreateFolderPathRecursive(newFolder, pathComponents);
    }

    private void SetCustomVersionLabel(SPListItem item) {
        EventFiringEnabled = false;
        item[CustomVersion] = GetOfficialVersionLabel(item);
        item.SystemUpdate(false);
        EventFiringEnabled = true;
    }

    private string GetCustomVersionLabel(SPItem item) { return item[CustomVersion] as string; }
    private string GetOfficialVersionLabel(SPListItem item) { return item.Versions[0].VersionLabel; }
}

When a list item is saved, I take a snapshot of the attachments, storing them in a folder structure like {Id}/{VersionNumber}/{Attachments} in the shadow document library. When a list item is restored to a previous version, existing attachments are first deleted before the ones from the snapshot are added back in, creating a new version of the list item.

Restoring previous versions also has a counter-intuitive meaning in SharePoint. Suppose in one version of a list item, I store a key in the item’s property bag, then I’d expect the property bag values to be specific to this version. But behind the scenes restore seems to work by cloning the current version and then copying only the values of the fields from the restore version to the new one. In other words, I can’t use the item’s property bag to store version specific information, such as a version tag to detect when a restore has occurred. I also can’t use the Modified field because SharePoint sets it to the time of the restore. To carry over version information I have to create and maintain a field of my own. Hence the CustomVersion field on the list to version.

Remember that because the ItemUpdated and ItemAdded execute asynchronously, all the snapshotting logic executes on a background thread, after control has returned to the user. Should an error occur at this point, the user will never see it and the snapshot may be left in an incomplete state. On the other hand, this approach scales well and doesn’t have to be fast because no user is awaiting the result.

Lastly, there’s one place in SharePoint where the versioning abstraction leaks through. It’s in the list item version dialog which displays older versions and enables restore to any previous version. The dialog will always show the most recent attachments.

Improvements

I could use the ETag property of an SPFile object to implement a more efficient differential snapshotting algorithm that would conserve storage space. Compressing attachments before storing them in the shadow library might also be an option, although then I’d have to promote the ETag value to a shadow library field before compressing.

Share

Tags: ,
Posted in .Net, SharePoint | Comments Off

Handy SharePoint 2010 extension methods for list definitions

Posted by Ronnie Holm on 15th November 2011

A quick word on organizing extension methods: I usually collect them in an Extensions folder, appending Extensions to the name of class being extended and keeping with the one class per file convention. For brevity I’ve left out the using and the namespace part below.

SPListCollection extensions

In SharePoint 2010 the TryGetList method has been added to the SPListCollection class. The method returns either an SPList instance matching the display name or null. Oftentimes, however, you want to do a lookup based on the internal name. Here’s an extension method that adheres to the semantics of TryGetList, but using the internal name. It relies on the fact that the RootFolder property of a list is actually its internal name:

// definition
public static class SPListCollectionExtensions {
    public static SPList TryGetListByInternalName(this SPListCollection lists, string internalName) {
        return (from SPList l in lists
            where l.RootFolder.Name == internalName
            select l).SingleOrDefault();
    }
}

// use
if (site.Lists.TryGetListByInternalName(internalListName) == null)
   // list not found

SPFieldCollection extensions

Using the CreateNewField method of the SPFieldCollection you can add new fields to a list. The particular annoying aspect of this method, however, is that when you want to continue working with its result, oftentimes you have to cast it to one of the SPField subclasses. But since the SPFieldType, provided as one of the arguments to CreateNewField, closely relates to the actual SPField return type, an extension method is able to do the casting. This’ll expose mismatches at compile time instead of at runtime.

All it takes is for us to map out the relation between SPField and SPFieldType:

// definition
public static class SPFieldCollectionExtensions {
    public static TSPField CreateField<TSPField>(this SPFieldCollection fields,
            string internalName, string displayName) where TSPField : SPField {
        var spFieldToFieldType = new Dictionary<Type, SPFieldType> {
            { typeof(SPFieldDateTime), SPFieldType.DateTime },
            { typeof(SPFieldNumber), SPFieldType.Number },
            { typeof(SPFieldUser), SPFieldType.User },
            { typeof(SPFieldBoolean), SPFieldType.Boolean },
            { typeof(SPFieldMultiLineText), SPFieldType.Note },
            { typeof(SPFieldText), SPFieldType.Text }
        };

        var fieldType = spFieldToFieldType[typeof(TSPField)];
        var list = fields.List;
        var field = list.Fields[list.Fields.Add(internalName, fieldType, false)];
        field.Title = displayName;
        field.Update();
        return field as TSPField;
    }
}

// use
l.Fields.CreateField<SPFieldBoolean>(internalName, "displayName");

Taking the CreateField extension method one step further, oftentimes you want to set properties besides internal name and display name. For that purpose I’ve defined a CreateField method that accepts an Action<TField>. This allows you to reuse common property settings across fields for brevity and consistency while at the same time maintaining strong typing.

// definition
public static TSPField CreateField<TSPField>(this SPFieldCollection fields,
        string internalName, string displayName,
        Action<TSPField> setAdditionalProperties) where TSPField : SPField {
    var newField = CreateField<TSPField>(fields, internalName, displayName);
    setAdditionalProperties(newField);
    newField.Update();
    return newField;
}

// use
public static Action<SPFieldMultiLineText> RichTextProperties = f => {
    f.RichText = true;
    f.RichTextMode = SPRichTextMode.FullHtml;
};

l.Fields.CreateField<SPFieldBoolean>(internalName, "displayName", f => f.Required = true);
l.Fields.CreateField(internalName, "displayName", RichTextProperties);

With the Comment field, you can leave out the type argument because the compiler infers it based on the type of the Action delegate.

Similar to CreateField, I’ve defined two additional extension methods for creating lookup fields:

// definition
public static TSPField CreateLookup<TSPField>(this SPFieldCollection fields,
        string lookupListName, string internalName,
        string displayName) where TSPField : SPFieldLookup {
    var currentList = fields.List;
    var lookupList = currentList.ParentWeb.Lists.TryGetListByInternalName(lookupListName);
    var newField = currentList.Fields[currentList.Fields.AddLookup(internalName, lookupList.ID, false)];
    newField.Title = displayName;
    newField.Update();
    return newField as TSPField;
}

public static TSPField CreateLookup<TSPField>(this SPFieldCollection fields,
        string lookupListName, string internalName, string displayName,
        Action<TSPField> setAdditionalProperties) where TSPField : SPFieldLookup {
    var newField = CreateLookup<TSPField>(fields, lookupListName, internalName, displayName);
    setAdditionalProperties(newField);
    newField.Update();
    return newField;
}

// use
l.Fields.CreateLookup<SPFieldLookup>(lookupListName, internalName, displayName, f => f.AllowMultipleValues = true);

These extension methods makes using the SharePoint API more type-safe and concise, and defining lists using these methods and the template approach saves me from writing a lot of repetitive code.

Share

Tags: , ,
Posted in .Net, SharePoint | Comments Off

Notes from Geek Night talk on Advanced Windsor Tricks

Posted by Ronnie Holm on 19th December 2010

My notes from a talk on Windsor by Mogens Heller Grabe which I attended this week. Slides and code are available through Dropbox, but I don’t know for how long.

  • Use an Inversion of Control container (IoC) like Windsor to create an architecture that responds well to change, i.e., an architecture that promotes looser coupling and higher cohesion
  • The ability to write unit tests against a code base is a good measure of its degree of coupling
  • By having components depend on interfaces, you’re free to switch the implementation at runtime, introducing flexibility into the software
  • Avoid having concrete components talk to each other. It makes it hard for the two to vary independently, e.g., adding logging later can only be done by tearing components apart and putting them back together
  • At runtime the IoC container recursively composes these smaller components into an object graph, e.g., by passing concrete implementations through the constructors to satisfy the dependencies
  • One alternative to using an IoC container is using a service locator. Each component’s constructor would then ask the locator for its dependent components. While this approach works, it makes testing components in isolation difficult because all dependencies are now locked away inside the service locator. Instead, by having all dependencies supplied in the constructor, unit tests can directly supply fakes implementations
  • When software is only used in one environment it tends to be fairly inflexible because it’s only suited for that one purpose. As soon as you design it to be used in two or more environments it becomes more flexible. Thing of different environments as unit test, staging, production, and so on
  • The idea is for components to not have to be modified depending on the where they’re running. It’s the IoC container that dynamically tie components together based on the environment
  • Suppose you didn’t use a IoC container. Then ultimately the top-level object would have to pass concrete instances down the chain. The top-level object may well be the UI layer. But having the UI layer know about data access components, service components, and logging components isn’t ideal. Instead use an IoC container, which is nothing more than a factory for components
  • With the Windsor IoC container (and most others), the usage pattern typically involves three stages
    • Register: tie together interfaces with concrete implementations
    • Resolve: return concrete implementations of interfaces required to satisfy the dependencies of an object
    • Release: dispose of concrete implementations
  • Most people only use a fraction of the functionality of an IoC container. Instead of taking dependency of a full-blown container, you could easily roll your own IoC container that implements the core functionality in a few dozen lines of code — in an associated .NET Rocks and DnrTV episode James Kovacs elaborates on his original article
  • The goal of using a container isn’t to get rid of calls to the new operator. It’s to not use the new operator for the parts of an applications where flexibility is required
  • Simple forms of Aspect Oriented Programming are possible using interceptors
  • Windsor supports the decorator pattern so you can have one component wrap another at runtime. This lets you implement features such as logging without actually modifying the original component. It’s an example of adhering to the open/close principle by which classes should be open for extension, but closed for modification. In other words: don’t edit the original source code to introduce new behavior. Instead use methods of composition
  • Avoid configuring Windsor through XML and instead use the fluent interface — perhaps in a separate assembly so only it needs to be redeployed when the configuration changes. When a value, such as a connection string needs to be configurable, create a section in app.config and add the value there. Windsor will know how to locate the value when instantiating objects
Share

Tags: , ,
Posted in .Net | Comments Off

Notes from Geek Night talk on SOA Done Right Using NServiceBus

Posted by Ronnie Holm on 11th December 2010

My notes from a talk on NServiceBus by Mogens Heller Grabe which I attended this week. Slides and code are available through Dropbox, but I don’t know for how long.

  • Enterprise service bus
    • An architectural pattern rather than a specific technology or product
    • The ethernet of SOA
  • Types of service bus
  • The current version 2.0 of NServiceBus is free whereas the binary version 2.5 will contain limitations. The limitations can be disabled by downloading and compiling the source code yourself
  • NServiceBus is designed to address the fallacies of distributed computing
    • It primarily addresses the fallacies through the use of reliable one-way messaging
    • Reliable means messages are stored in a queue, such a MSMQ, which is part of all recent Windows installations. The assumption being that the queue is always present and ready to receive messages
    • One-way means messages are asynchronous and fire-and-forget. The destination service doesn’t have to be online when a message goes into its queue, which makes services temporally independent. At some point in the future the service may post a reply to the client’s queue
    • Messaging means that any action is accomplished by posting a self-contained message, containing the type of operation and its parameters, to the queue of a service
    • With NServiceBus, a class is marked with an interface to describe that it’s a type of message. Instances of the class are then serialized using a standard XML or binary serializer and stored in a queue
  • NServiceBus service != traditional web service
    • An NServiceBus service is a class that implements an interface
    • Using a message queue is more reliable when the service has to make calls to other services as part of its operation. With a traditional web service, calling other services from within makes the service slower and less reliable. If only one dependent service is down it’s like with old Christmas lights wired in series: the entire stack may be down. Queues on the other hand introduce a save place, a stabilization point, to temporarily store messages
    • A service is typically configured with an input queue that identifies it to others and an error queue that messages are routed to after some number of retries. In addition, each service may be configured with a number of threads to use to parallelize message processing
Share

Tags: , ,
Posted in .Net | Comments Off

Notes from Tech Talk on Advanced .NET debugging with Windbg

Posted by Ronnie Holm on 28th November 2010

My notes from a talk on Windbg by Brian Rasmussen which I attended this week. The talk was recorded and parts one and two are available though Channel 9.

  • Windbg isn’t a replacement for VS, but VS doesn’t handle some advanced cases
  • Windbg is a free user mode/kernel mode debugger which is part of the Debugger Tools for Windows
  • Customers may not be happy installing VS to debug code in production since it installs a lot of components and requires restarts
    • Windbg requires only a simple installation once you extract the redistributable from the Debugger Tools for Windows
  • Loading SOS.dll from the Microsoft .NET folder into Windbg makes it understand .NET
    • With SOS.dll loaded, you can look into the CLR and its data structures
    • Make sure to load the right SOS.dll for your runtime
    • SOS.dll is also available for the Silverlight runtime
  • Debugger Markup Language support is available for version 4 of SOS.dll
    • Provides hyperlinks in the command output of SOS
  • Like with the VS debugger, you can insert Debug.Break() in your demo app and have Windbg halt on it
    • Release builds can be debugged with Windbg
    • Release builds also contain symbols. Like with debug builds, symbols are stored in a separate file
    • Release builds make debugging harder because the jitter kicks in and modifies the code
    • The 64 bit calling convention of passing arguments via registers makes it harder to locate information when debugging
  • Windbg generally needs symbols loaded, although it’s less important when debugging managed code
    • Use MS’ public symbol server to load symbols on demand
    • Set the _NT_SYMBOL_PATH environment variable to point to your symbols (will affect VS as well)
    • Or use the .symfix command from within Windbg
  • Popular extensions to Windbg: SOSEX and Psscor2 (replacement for SOS, useful for ASP.NET debugging)
  • Create dump file to analyze: use task manager or ADPlus or ProcDump from Sysinternals, which can dump based on triggers
  • ADPlus collects crash dumps or hang dumps
    • Hang dumps can be captured from the same process multiple times and may be useful when debugging deadlocks or resource leaks
    • When you capture a hang dump, the process is halted for the dump period and is then restarted
  • A *32 process in task manager is actually a 64 bit process when you dump it
    • WoW64.dll is involved when dumping from the task manager
    • Not what you typically want because you don’t get full access to 32 bit process information
  • A .NET application is hosted within the CLR which is itself hosted within a regular Windows process
    • Looking at memory usage with the task manager therefore doesn’t tell you much about the .NET part
Share

Tags: , ,
Posted in .Net, Windows | Comments Off

Demystifying LINQ to Objects

Posted by Ronnie Holm on 19th October 2010

To improve my understanding of LINQ, I’ve long wanted to learn how LINQ to Objects works under the covers. I’ll do so by relating query expression syntax to method invocation syntax and lambda expressions to generic delegates. Since many of the building blocks that make up LINQ are nothing more than syntactic sugar around .NET 2.0 constructs, learning LINQ in terms of .NET 2.0 is one way to get into the more functional spirit of C#.

To set the stage, I’ve created an array of people to query.

    Person[] people = new[] { new Person { Age = 1, ...}, ... };

It’s worth noting that all arrays implicitly derive from the Array class. It’s a special class that only the compiler and the runtime may derive from. Doing so yourself is rewarded with a compiler error stating simply that the class cannot be derived from. What’s also not apparent from the definition of the Array class is that, as of .NET 2.0, it also implements IList<T>, ICollection<T>, and IEnumerable<T>, which derive from the corresponding non-generic interfaces.

    public abstract class Array :
        ICloneable, IList, ICollection,
        IEnumerable, IStructuralComparable, IStructuralEquatable

The runtime takes special care of extending the Array type with implementations of the generic interfaces based on the type of object that the array stores. That’s why the interfaces are invisible in the documentation. Any array of type T — in this case Person — will therefore implement, among others, the IEnumerable<T> interface. It’s the implementation of this interface on a class that enables LINQ to Objects to query any array or collection.

Query expression syntax

Let’s create a simple query using query expression syntax. To support it, additional keywords have been added to C# to express common query operators. Examples of what’s not supported with query expression syntax are the Sum, Take, and Skip query operators. In those cases you can combine query expression with method invocation syntax or write everything using the latter.

    var q = from p in people
            where p.Age > 25
            select p;

Method invocation syntax

To make matters transparent, let’s not rely on type inference and the var keyword for typing the result. At the same time, let’s assume the role of the compiler and translate the query expression syntax into method invocation syntax with lambda expressions. Each standard query operator, such as Where and Select, translates to a corresponding method invocation on the collection, oftentimes with a lambda expression as the argument. Both representations are semantically equivalent, but for this simple query method invocation syntax appears more complex. This is generally not the case for either representation as you can try out with a tool like LINQPad.

    IEnumerable<Person> r = people
        .Where(p => p.Age > 25)
        .Select(p => p);

The concise nature of the code is due to the compiler inferring the type arguments to Where and Select and the type of the lambda expression. In this case, because the collection stores objects of type Person, the type arguments of the generic methods as well as the type of the lambda expression is also of type Person. To make these types explicit, you can substitute real types for the generic ones in the definition of Where and Select provided later.

    IEnumerable<Person> s = people
        .Where<Person>((Person p) => p.Age > 25)
        .Select<Person, Person>((Person p) => p);

Generic delegates

With the type specifications made explicit, it’s more obvious how lambda expressions are compatible with delegates. A lambda expression is nothing more than a short-hand notation for a delegate, a type-safe pointer to a piece of code, to be passed into a method. Hence, a lambda expression can be substituted with an anonymous delegate by wrapping it in additional ceremony.

    IEnumerable<Person> t = people
        .Where<Person>(delegate(Person p) { return p.Age > 25; })
        .Select<Person, Person>(delegate(Person p) { return p; });

To make delegates type safe, their definition include the return type and the types of the arguments passed into it. This is unfortunate since the standard query operators must be able to work on any type of object. LINQ therefore relies on generic delegates in the definition of its operators. Like with other generic types, the compiler and runtime work together to generate real delegates based on the specified, or inferred, return type and types of arguments. Delegates like the ones below for Where and Select are what’s generated by the runtime.

    // delegate Boolean WhereDelegate(Person p);
    // delegate Person SelectDelegate(Person p);
    bool WhereClause(Person p) { return p.Age > 25; }
    Person SelectClause(Person p) { return p; }

    IEnumerable<Person> u = people
        .Where<Person>(WhereClause)
        .Select<Person, Person>(SelectClause);

LINQ relies on a set of generic delegates defined in the .NET framework. These delegates come in two flavors: those that return void, named Action, and those that don’t, named Func. You can use these delegates in your own code to not only parameterize methods by value but by functionality.

    delegate void Action();
    delegate void Action<T>(T obj);
    delegate void Action<T1, T2>(T1 arg1, T2 arg2);
    // up to eight arguments

    delegate TResult Func<TResult>();
    delegate TResult Func<T, TResult>(T arg);
    delegate TResult Func<T1, T2, TResult>(T1 arg1, T2 arg2);
    // up to eight arguments

Extension methods

Each of the 50 or so standard query operators is defined on the Enumerable class with the this modifier on their first IEnumerable<T> argument. This makes them extension methods to every object that implements IEnumerable<T>. From the definition of the Where operator below, it then follows that when you write "Where(p => p.Age > 25)", the compiler infers that because Where is called on a collection of type Person, p must also be of type Person. Furthermore, because the result of the comparison is a bool, the lambda expression is compatible with a delegate that accepts a type Person and returns a type bool. In other words, the matching Where has a signature like the commented one below.

The purpose of the Where operator is to filter a collection based on a predicate, the mathematical term for a function that returns true or false. The people that satisfy the predicate is therefore returned as an IEnumerable<Person>. However, the yield keyword adds an interesting twist to the return of Where and Select and other operators that return IEnumerable<T>. Instead of the operator iterating the source collection and returning every element of the result at once, the yield keyword instructs the compiler to emit a state machine so the operator can keep track of how far through the source collection it is, and return one resulting element at a time. The effect is lazy evaluation of LINQ to Objects queries.

In the example, the output of Where goes into Select whose output may goto the console using a foreach. To understand yield, think of the foreach that writes to the console as pulling on a string of objects connecting Select to Where to the source collection. Each iteration of the loop pulls the string just enough to get at the next resulting element. This may in turn require Where to iterate multiple times until the predicate is once again satisfied.

    // IEnumerable<Person> Where<Person>(
    //     IEnumerable<Person> source,
    //     Func<Person, bool> predicate)

    public static IEnumerable<TSource> Where<TSource>(
        this IEnumerable<TSource> source,
        Func<TSource, bool> predicate) {
        foreach (TSource element in source) {
            if (predicate(element))
                yield return element;
        }
    }

The Select operator is an example of one whose input type may differ from its output type. By its very nature, it projects one type onto another. It can even project onto an anonomous type, like in "Select(p => new { x = p.Name })", but then the var keyword must be used for its return type. In the example I project from Person to Person, which makes the matching signature of Select like the commented one below.

    // IEnumerable<Person> Select<Person, Person>(
    //     this IEnumerable<Person> source,
    //     Func<Person, Person> selector)

    public static IEnumerable<TResult> Select<TSource, TResult>(
        this IEnumerable<TSource> source,
        Func<TSource, TResult> selector) {
        foreach (TSource element in source)
            yield return selector(element);
    }

The ability to chain methods together to form a data pipeline is what adds real power to LINQ. Although the idea of chaining methods together is old, the traditional approach of each method returning a new or mutated instance works best for objects of your own. Imagine adding to the IEnumerable<T> interface a set of query operators. Then every collection would have to provide implementations for Where, Select, and so on. Also, it wouldn’t be possible to add new operators in vNext of .NET without breaking existing code implementing the interface.

Extension methods solve this issue by providing the necessary syntactic sugar to make chaining of methods on a collection feel like chaining on an object of your own. Behind the scenes, the compiler rewrites calls to extension methods to static methods.

    var v = Enumerable.Select(
        Enumerable.Where(people, p => p.Age > 25),
        p => p);

Note how, without the use of extension methods, operations must be specified in the opposite order in which they’re executed. This doesn’t read as nicely as the “infix” syntax when many operations are chained together.

Share

Tags: ,
Posted in .Net | Comments Off

Unit testing LINQ to SQL using TypeMock

Posted by Ronnie Holm on 4th May 2010

Recent months have brought about a proliferation of mocking frameworks that mocks what more traditional framework like Rhino Mocks cannot. Instead of creating and loading a mock implementation at runtime, the new breed of mocking frameworks hooks into the CLR to intercept and redirect calls. This opens up virtually every aspect of a class to mocking, which is useful for testing code not written with explicit testability in mind. Until recently, TypeMock was the only mocking framework around that took the latter approach, but it’s now being challenged by Moles from Microsoft Research and JustMock from Telerik.

Why traditional dependency-breaking techniques come short

After watching a screencast on how to use Moles to unit test LINQ to SQL without hitting the database, I thought it would be interesting to do the same with TypeMock. But first, let’s make sure we understand why traditional dependency-breaking techniques come short in testing LINQ to SQL. Assuming we want to put a repository under test, our goal is to mock how it accesses the database. Here’s a simple implementation of a repository that queries the Employee table of the AdventureWorks database:

    public class EmployeeRepository {
        public List<Employee> GetEmployeesByHireDate(DateTime start, DateTime end) {
            using (var ctx = new AdventureWorksDataContext())
                return (from e in ctx.Employees
                        where e.HireDate >= start && e.HireDate <= end
                        select e).ToList();
        }
    }

All calls to the database are routed through the AdventureWorksDataContext generated by Visual Studio. To mock access to the database, we therefore have to mock part of the data context. Easier said than done, though, for the context doesn’t expose an interface that a fake can implement. In addition, the tables are accessed through properties on the context that return a type of Table<TEntity>. Unfortunately, the constructor of Table<TEntity> is internal and the class itself is sealed, eliminating the hope of instantiating or subclassing the type by traditional means:

    public sealed class Table<TEntity> : IQueryProvider,
            ITable, IListSource, ITable<TEntity>, IQueryable<TEntity>,
            IEnumerable<TEntity>, IQueryable, IEnumerable
            where TEntity : class {
        internal Table(DataContext context, MetaTable metaTable) {
            ...
        }
    }

For an example of how the data context itself creates an instance of Table<TEntity>, take a look at the Employees property on the AdventureWorksDataContext. It relies on the GetTable<Employee> method on the DataContext class to create an instance of Table<Employee>. Despite its constructors being internal, the GetTable<TEntity> method has no trouble constructing an instance of the Table<TEntity> type, as they both reside in the System.Data.Linq assembly:

    public partial class AdventureWorksDataContext : DataContext {
        public Table<Employee> Employees {
            get {
                return GetTable<Employee>();
            }
        }
    }

How to break the unbreakable

The design of LINQ to SQL leaves us short of a traditional testing seam, as Michael Feathers would phrase it; a place at which we can alter the behavior of a program without editing in that place. This explains why, with LINQ to SQL, traditionally we’ve had to test against a real database with all its constraints, making our tests brittle, slow, and painful to write and maintain. With the new breed of mocking frameworks the issues of not being able to subclass or not being able to call an internal constructor go away (and new issues take their place). Regardless, here’s how to write a unit test for the CustomerRepository that doesn’t hit the database:

    [TestClass]
    public class CustomerRepositoryTest {
        private EmployeeRepository _repository;

        [TestInitialize]
        public void Initialize() {
            _repository = new EmployeeRepository();

            var fakeEmployees = new List<Employee> {
                new Employee {EmployeeID = 1, HireDate = new DateTime(2004, 12, 1)},
                new Employee {EmployeeID = 2, HireDate = new DateTime(2006, 7, 1)},
                new Employee {EmployeeID = 3, HireDate = new DateTime(2009, 3, 1)}
            }.AsQueryable();

            var fakeDataContext = Isolate.Fake.Instance<AdventureWorksDataContext>();
            Isolate.Swap.NextInstance<AdventureWorksDataContext>().With(fakeDataContext);

            // var fakeEmployeeTable = Isolate.Fake.Instance<Table<Employee>>();
            // Isolate.WhenCalled(() => fakeDataContext.Employees).WillReturn(fakeEmployeeTable);
            // Isolate.WhenCalled(() => fakeEmployeeTable).WillReturnCollectionValuesOf(fakeEmployees);
            // or by transitivity
            Isolate.WhenCalled(() => fakeDataContext.Employees).WillReturnCollectionValuesOf(fakeEmployees);
        }

        [TestMethod]
        public void GetEmployeesByHireDate_should_return_hires_from_2008_until_present() {
            var employees = _repository.GetEmployeesByHireDate(new DateTime(2008, 1, 1), DateTime.Now);
            Assert.AreEqual(1, employees.Count());
            Assert.AreEqual(3, employees[0].EmployeeID);
        }
    }

The test method itself looks exactly as if we’d been testing against a real database. The difference lies in the Initialize method, where we setup the fake data context and database contents. We instruct TypeMock to return the fake context in place of the real one inside EmployeeRepository. And whenever someone calls the Employees property on the fake context, we have TypeMock intercept the call and return a fake collection of type IQueryable<Employee>. We could’ve returned an instance of Table<Employee>, which implements IQueryable<Employee>, but in this case returning the collection is simpler and sufficient. Had we had more methods on our repository, we likely would’ve added additional rows to the Employee table and populated more of its columns.

Share

Tags: , , ,
Posted in .Net | 3 Comments »

The given-expect testing pattern

Posted by Ronnie Holm on 25th April 2010

I was watching Brett Schuchert’s TDD screencast on implementing the shunting yard algorithm in C#. In it Brett builds up his tests in a style I hadn’t come across before. Each test is expressed as a given-expect statement. A pattern that is particularly useful in situations in which a class has a main method that accepts an open-ended number of dissimilar inputs.

I found the given-expect pattern useful in testing a piece of code that I was working on this week. I was refactoring and adding tests around an ASP.NET control adapter that makes SharePoint 2007 pages more XHTML compliant. I wanted to reuse the transformations outside the control adapter and hence ended up moving the transformation logic to a new class. It accepts possibly malformed HTML and relies on heuristics of the HTML Agility Pack to build a DOM off of it. I can then query the DOM, looking for known violations, and patch them before returning XHTML to the caller.

    public class HtmlToXHtmlTransformer {
        private readonly HtmlDocument _document;

        public HtmlToXHtmlTransformer(string html) {
            _document = new HtmlDocument();
            _document.DetectEncoding(new StringReader(html));
            _document.LoadHtml(html);
        }

        private void Transform(string xpath, Action<HtmlNode> nodeMatch) {
            var nodes = _document.DocumentNode.SelectNodes(xpath);
            if (nodes != null)
                foreach (var node in nodes)
                    nodeMatch.Invoke(node);
        }

        private void FixDuplicateBorderAttributeOnSPGridViewControl() {
            Transform("//table[count(@border)=2]", node => node.Attributes.Remove("border"));
        }

        public string Transform() {
            FixDuplicateBorderAttributeOnSPGridViewControl();
            _document.OptionWriteEmptyNodes = true;
            return _document.DocumentNode.WriteTo();
        }
    }

The complete HtmlToXHtmlTransformer collects a dozen transformations. Its Transform method is what we want to call with various HTML fragments to verify that they come out as XHTML. For this purpose, we might do the tests as Visual Studio data-driven tests that read their input and output from a text file. But in most cases I prefer traditional tests, so I can describe the purpose of a test with a descriptive method name and possibly a comment.

    [TestClass]
    public class HtmlToXHtmlTransformerTest {
        private string _result;

        [TestMethod]
        public void Must_selfclose_nodes_when_allowed() {
            Given("<br>");
            Expect("<br />");
        }

        [TestMethod]
        public void Must_remove_duplicate_border_on_SPGridView_control {
            Given(@"<table border=""0"" border=""0""></table>");
            Expect(@"<table border=""0""></table>");
        }

        private void Expect(string xhtml) {
            Assert.AreEqual(xhtml, _result);
        }

        private void Given(string html) {
            var transformer = new HtmlToXHtmlTransformer(html);
            _result = transformer.Transform();
        }
    }

I particularly like the clarity of the given-expect pattern and find that for a reasonable number of tests it’s a viable alternative to data-driven test. I do, however, recognize the value of data-driven tests in situations where a non-developer wants to test a class. Though at the unit test level I’ve never experienced this. It’s more characteristic of FitNesse for acceptance testing. However you unit test, just make sure your tests run with a minimum of effort on your part and that they run fast.

Share

Tags: , , ,
Posted in .Net, SharePoint | Comments Off