Bugfree.dk – Ronnie Holm's blog

Not anti-anything, just pro-quality

CruiseControl.net and build queues

Posted by Ronnie on 19th November 2006

(I’ve created a Download page for the tool described in this post, with a short description of prerequisites and how to install it.)

For team development a way is needed to ensure that team members doesn’t inadvertently break the shared code, e.g., a developer forgets to commit all files related to a change, which is something not easily fixable by other developers.

Also, the team might want to run post-compile tasks, such as unit tests, on a dedicated machine to avoid the works-on-my-machine syndrome.

Ideally, the project’s code should always go from one stable state to another so that a developer doesn’t get broken code the next time he synchronizes with source control.

One way to achieve this is by means of continuous integration of changes into the code. Either by rolling your own build script that monitors source control for changes to trigger a build and inform developers of the result or by leveraging an existing continuous integration solution, such as CruiseControl.net (ccnet): a continuous integration server, an asp.net based front-end, and a desktop client.

Although CruiseControl (both the Java and .Net incarnation) appears to be a popular choice among developers, I find it somewhat immature: it violates the dry principle in its projects configuration file and, worst of all, it doesn’t support build queues to ensure that only one of several projects are actually building at once.

It’s not a problem for ccnet itself to perform parallel continuous integration. However, the build chain is no stronger than the weakest link, which in my case is devenv.exe, the Visual Studio executable used by ccnet to build the application (by running devenv.exe from the command line with a couple of arguments, you can leverage the build system within Visual Studio from a script).

The downside to using devenv.exe, however, is that all running instances share a temporary build folder for compiler output. Thus, when ccnet does continuous integration of branches in parallel, the instances of devenv.exe spawned by ccnet compete for write access to identically named files, namely the executables generated by the compiler.

Not surprisingly, this results in build failures for one or more ccnet projects, although there is nothing wrong with the code.

In my experience from the trenches this happens too often to be ignored for any serious use of ccnet: It’s just too common for a developer to cross-commit a bugfix to several branches or for independent developers to commit to their working branches in between when ccnet checks the branch for updates.

Having these false negatives make the team members not trust ccnet and not take proper action whenever they see a build failure, thus rendering continuous integration useless.

One solution to the above file locking problem is to serialize the builds manually by building branches at predefined intervals, thus interleaving the builds so that they shouldn’t be able to overlap. As a result the developer may have to wait a fair amount of time to be sure his latest commit doesn’t breaks anything. Also, if a build has not finished when the next is scheduled to start, it may cause a cascade of broken builds to occur until the interleaving is properly realigned.

Another solution is to head over to Jay Flowers. He has branched off the official 1.0 version of ccnet and created a ccnet that allows for a lot more constraints than just serializing builds. It happens at the expense of running the non-official, not-recent version of ccnet, though.

My solution to the parallel build problem was to write a front-end for ccnet that implements build queues without the knowledge of ccnet, i.e., by means of checking for changes on a configurable number of branches. If a change is detected, the branch is enqueued. Then when all branches are checked, the tool force builds each branch, monitoring the state of the ccnet server to ensure that only one branch is building at a time.

With this solution you integrate the nice features of ccnet, such as the nice webgui for displaying what went wrong with a build and the systray application that goes green, yellow, or red depending on the state of your builds, with rapid feedback in a way that is compatible with Visual Studio’s integrated build engine.

The biggest downside, however, is that you can no longer see the changes that triggered a build from the web gui as ccnet is no longer in charge of cvs operations.

The tool is roughly 100 lines of IronPython code and works by spawning cvs.exe, synchronizing a number of working copies with the cvs repository, and in the process monitors the output of cvs.exe for changes, additions, and removals of files to the working copies.

The state of CruiseControl is determined by screen scraping the webgui and builds are forced by simulating a click on the “build” button for the ccnet project in the webgui for which a change was detected.

The url of the ccnet webgui, the path to the cvs.exe, and mappings between the path on disk where each working copy resides and the corresponding name of the ccnet project are stored in the tools configuration file, which is itself Python code loaded into the tool on startup.

Using Python as a language for your configuration file, you don’t have to invent your own language or use verbose xml, which requires you to write code for parsing it.

PS: I recently stumbled across the Sequential Task Plug In that I want to try out sometime.

Update, April 16: It has come to my attention that there is a way to minimize duplication in the configuration file. It involves “… users to set up DTD entities within the CCNet configuration” as mentionen here. I haven’t tried it, though.

Update, August 10: CruiseControl.net 1.3 now comes with a build-in queue feature.

  • Share/Bookmark

Tags: , , ,
Posted in .Net, Python | No Comments »

Getting into Python loops

Posted by Ronnie on 13th January 2006

Looping is such a commonly used construct that I thought I’d do an informal comparison of the flavors of loops provided. For comparison, I define a function for scaling a list using (1) list comprehension, (2) the map function, (3) the for loop, and finally, just for the fun of it, (4) a recursive function. Mathematically, scaling a lists involves multiplying each elements by a scale factor, e.g., applying a scale factor of 10 to [1, 2, 3] yields [10, 20, 30].

The following code benchmarks each method, running it 500 times in a row. Furthermore, to ensure that the results are independent of the length of the list, I test each function on lists of sizes 1000 to 30000, in 100 increments, looking for a clear growth patterns (but found none).

from time import time
def scale_using_recursion(s, l):
    if len(l) == 0:
        return []
    else:
        return [(l[0] * s)] + scale_using_recursion(s, l[1:])

def scale_using_loop(s, l):
    r = []
    for e in l:
        r.append(e * s)
    return r

def scale_using_map(s, l):
    return map(lambda e: e * s, l)

def scale_using_list_comprehension(s, l):
    return [e * s for e in l]

def run(func, iterations, list_size):
    l = [e for e in range(list_size)]
    t = time()
    [func(10, l) for i in range(iterations)]
    return (time() - t) / iterations

def time_stuff():
    iterations = 500

    for list_size in range(1000, 30000, 100):
        t1 = run(scale_using_list_comprehension, iterations, list_size)
        t2 = run(scale_using_map, iterations, list_size)
        t3 = run(scale_using_loop, iterations, list_size)
        print("%s %s %s %s") % (list_size, t1/t1, t2/t1, t3/t1)

if __name__ == "__main__":
    time_stuff()

The benchmark was performed under MS Windows, running Python 2.4.2. To make the results independent of the speed of the hardware, I compare each scaling method to the winner of the race, the list comprehension approach.

scale_using_list_comprehension:  1.0
scale_using_loop:                1.7
scale_using_map:                 2.1
(scale_using_recursion:         76.0)

Not surprisingly, the recursive approach is slow and infeasible for real use (and only tested on a list size of 500 to get an indication of relative performance). Generally, recursive functions are of limited use in Python because the interpreter doesn’t optimize for tail recursion; a technique commonly used in functional languages that causes a tail recursion to get transformed into an iteration, making a recursive and an interative approach equally fast.

Lacking this features, Python blows up with a “RuntimeError: maximum recursion depth exceeded”, when it reaches the default recursion depth of 500. We could increase the number for certain situations, of course, but we would never truly be in safe range of the RuntimeError, and performing the way it does, it doesn’t really matter.

I was surprised to learn that map and for loops are roughly 70% and 110% slower, respectively, than list comprehension. I can’t really put my finger on what makes them so slow.

Although this isn’t the most scientific of studies, it does indicate that list comprehension is the way to go, if possible. Luckily, in most cases, list comprehension is also the shortest, the most elegant, and the most readable of the approaches.

  • Share/Bookmark

Tags:
Posted in Python | No Comments »

Being a functional Pythonian

Posted by Ronnie on 4th January 2006

In my first blog post, I’ll give an example of how functional programming (FP) in Python results in shorter, cleaner, and more concise code, compared to the equivalent in an imperative language like C#.

I’ll use a real-world build tool I’ve written as the driving example. Typical tasks for such a tool would include, but not be limited to, checking out sources from cvs, compiling the sources, and running NUnit against a number of generated targets.

Each such task may be defined as a function, taking any number of arguments and returning a list of the form:

[status, [(command, output), (...)]]

where status indicates binary success, command is the command run to complete the task and output is the output (stdout/stderr) of running the command. There may be more than one (command, output) tuple inside the sublist, if the task requires more than one command to complete.

In FP the list and tuple are fundamental and very powerful concepts, because they can easily be used to define complex structures for you to use when passing data around. They also facilitate easy processing using list comprehension, the functions build into the language, or functions of your own, providing elegant syntax for common operations, such as insertion, deletion, searching, slicing, etc.

These lists and tuples are more dynamic in nature than their array and struct counterparts in, say, C#. In fact, rather than creating user defined types, a list, or a tuple, or a combination often suffices, eliminating the need to write a lot of boilerplate code.

Now, to provide an example where Python’s FP support really kicks in with the build tool, is in the implementation of the code responsible for running NUnit against a number of targets:

def NUnit(nunitconsole, projectsOrAssemblies):
    results = [Run("\"" + nunitconsole + "\" " + source)
              for source in projectsOrAssemblies]
    commands = [command[1][0] for command in results]
    outputs = [output[1][1] for output in results]
    failures = Sigma(re.findall("Failures: (\d+)", str(outputs)))
    return [not bool(int(failures)), zip(commands, outputs)]

This function relies heavily on the use of lists and list comprehension, rather than imperative style loops (which, of course, is also possible in Python). List comprehension is a construct that allows you to easily create a new list as a result of some processing on the elements of another.

A highlevel walk-through of the function: first the NUnit executable is invoked one target at a time. The Run function itself returns a list of the form [exitcode, (command, output)]; hence with n targets to test, the commands and outputs variables end up containing n commands and their output. The Failures variable counts the total number of failures accumulated across all targets, extracted as a list of strings matching the regular expression and summarized by the Sigma function.

Python doesn’t allow summing over a list of strings using its build-in Sum function. However, you can easily implement a Sum function that does in something along the lines of:

def Sigma(list):
    return reduce(lambda a, b: int(a) + int(b), list)

where reduce is a function present in most, if not all, FP languages, which reduces a list of values to a single value by applying a user-defined operation to pairwise values of the list.

Finally, the zip function is a build-in function, merging the elements of two lists into one.

Another important construct in FP is the lambda notation. It enables you to create an anonymous function, a function without an explicitly name, leading us to another key characteristic of FP: functions are treated as first-class citizens, in the sense that they may be passed around as just another variable, e.g., you can create a list of functions with parameters and then do the actual invocation at a later stage.

Using deferred invocation, the build tool manages the overall build process; in a config file, similar to a makefile, a list of lambdas are defined, each defining a function that executes a task. This provides us with a generic way of annotating a task and for each task to be run by a central runner, which takes care of logging description, command, and outcome to a html report. The prefs used as parameters to each task is an associated array of associated arrays of … well as many levels as you require.

In effect the config file defines a domain specific language, using regular Python syntax, which can easily be loaded into what corresponds to the make utility by an import statement:

prefs = {
          "cvs" :
          {
            "cvs" : "C:\\Program Files\\cvsnt\\cvs.exe",
            "workingDirectory" : "C:\\Inetpub\\wwwroot",
            "root" : ":pserver:foo:bar@1.2.3.4:/foobar",
            "module" : "baz",
            "revision" : "HEAD"
          },
          "nunit" :
          {
            "nunitconsole" : "C:\\...\\nunit-console.exe",
            "projectsOrAssemblies" : [C:\\Inetpub\\...\\foobar.nunit" ]
          },
          ...
        }

tasks = [ ("Checkout Foobar from cvs",
           lambda: Task.CvsCheckout(prefs["cvs"]["cvs"],
                                    prefs["cvs"]["workingDirectory"],
                                    prefs["cvs"]["root"],
                                    prefs["cvs"]["module"],
                                    prefs["cvs"]["revision"])),
          ...,
          ("Running NUnit",
           lambda: Task.NUnit(prefs["nunit"]["nunitconsole"],
                              prefs["nunit"]["projectsOrAssemblies"]))
        ]

This example, using the basic concepts of FP such as lists, list comprehension, tuples, build-in functions, and nice syntax, hopefully provided you with a glimpse of how a lot can be achieved in relatively few lines of Python code. Also, if you really want to explore FP from a Python perspective, I suggest you take a look at the Xoltar Toolkit (unfortunately, though, it isn’t under active development), described in David Mertz’s columns More Functional Programming in Python and Even More Functional Programming in Python

Bottom line is that the use of FP style in your programs, although it takes time getting use to, leads to fewer lines of code, which leads to fewer places where things can go wrong.

This is not to say that your entire program should be written using the FP style (as is the case with Haskell). However, learning the basic concepts, and how Python exposes these to the developer, you’ll know which patterns to look for in your code that may be better solved using FP constructs.

  • Share/Bookmark

Tags: , ,
Posted in Python | No Comments »