Bugfree.dk – Ronnie Holm's blog

Not anti-anything, just pro-quality

Archive for January, 2006

Getting into Python loops

Posted by Ronnie Holm on 13th January 2006

Looping is such a commonly used construct that I thought I’d do an informal comparison of the flavors of loops provided. For comparison, I define a function for scaling a list using (1) list comprehension, (2) the map function, (3) the for loop, and finally, just for the fun of it, (4) a recursive function. Mathematically, scaling a lists involves multiplying each elements by a scale factor, e.g., applying a scale factor of 10 to [1, 2, 3] yields [10, 20, 30].

The following code benchmarks each method, running it 500 times in a row. Furthermore, to ensure that the results are independent of the length of the list, I test each function on lists of sizes 1000 to 30000, in 100 increments, looking for a clear growth patterns (but found none).

from time import time
def scale_using_recursion(s, l):
    if len(l) == 0:
        return []
    else:
        return [(l[0] * s)] + scale_using_recursion(s, l[1:])

def scale_using_loop(s, l):
    r = []
    for e in l:
        r.append(e * s)
    return r

def scale_using_map(s, l):
    return map(lambda e: e * s, l)

def scale_using_list_comprehension(s, l):
    return [e * s for e in l]

def run(func, iterations, list_size):
    l = [e for e in range(list_size)]
    t = time()
    [func(10, l) for i in range(iterations)]
    return (time() - t) / iterations

def time_stuff():
    iterations = 500

    for list_size in range(1000, 30000, 100):
        t1 = run(scale_using_list_comprehension, iterations, list_size)
        t2 = run(scale_using_map, iterations, list_size)
        t3 = run(scale_using_loop, iterations, list_size)
        print("%s %s %s %s") % (list_size, t1/t1, t2/t1, t3/t1)

if __name__ == "__main__":
    time_stuff()

The benchmark was performed under MS Windows, running Python 2.4.2. To make the results independent of the speed of the hardware, I compare each scaling method to the winner of the race, the list comprehension approach.

scale_using_list_comprehension:  1.0
scale_using_loop:                1.7
scale_using_map:                 2.1
(scale_using_recursion:         76.0)

Not surprisingly, the recursive approach is slow and infeasible for real use (and only tested on a list size of 500 to get an indication of relative performance). Generally, recursive functions are of limited use in Python because the interpreter doesn’t optimize for tail recursion; a technique commonly used in functional languages that causes a tail recursion to get transformed into an iteration, making a recursive and an interative approach equally fast.

Lacking this features, Python blows up with a “RuntimeError: maximum recursion depth exceeded”, when it reaches the default recursion depth of 500. We could increase the number for certain situations, of course, but we would never truly be in safe range of the RuntimeError, and performing the way it does, it doesn’t really matter.

I was surprised to learn that map and for loops are roughly 70% and 110% slower, respectively, than list comprehension. I can’t really put my finger on what makes them so slow.

Although this isn’t the most scientific of studies, it does indicate that list comprehension is the way to go, if possible. Luckily, in most cases, list comprehension is also the shortest, the most elegant, and the most readable of the approaches.

  • Share/Bookmark

Tags:
Posted in Python | No Comments »

Being a functional Pythonian

Posted by Ronnie Holm on 4th January 2006

In my first blog post, I’ll give an example of how functional programming (FP) in Python results in shorter, cleaner, and more concise code, compared to the equivalent in an imperative language like C#.

I’ll use a real-world build tool I’ve written as the driving example. Typical tasks for such a tool would include, but not be limited to, checking out sources from cvs, compiling the sources, and running NUnit against a number of generated targets.

Each such task may be defined as a function, taking any number of arguments and returning a list of the form:

[status, [(command, output), (...)]]

where status indicates binary success, command is the command run to complete the task and output is the output (stdout/stderr) of running the command. There may be more than one (command, output) tuple inside the sublist, if the task requires more than one command to complete.

In FP the list and tuple are fundamental and very powerful concepts, because they can easily be used to define complex structures for you to use when passing data around. They also facilitate easy processing using list comprehension, the functions build into the language, or functions of your own, providing elegant syntax for common operations, such as insertion, deletion, searching, slicing, etc.

These lists and tuples are more dynamic in nature than their array and struct counterparts in, say, C#. In fact, rather than creating user defined types, a list, or a tuple, or a combination often suffices, eliminating the need to write a lot of boilerplate code.

Now, to provide an example where Python’s FP support really kicks in with the build tool, is in the implementation of the code responsible for running NUnit against a number of targets:

def NUnit(nunitconsole, projectsOrAssemblies):
    results = [Run("\"" + nunitconsole + "\" " + source)
              for source in projectsOrAssemblies]
    commands = [command[1][0] for command in results]
    outputs = [output[1][1] for output in results]
    failures = Sigma(re.findall("Failures: (\d+)", str(outputs)))
    return [not bool(int(failures)), zip(commands, outputs)]

This function relies heavily on the use of lists and list comprehension, rather than imperative style loops (which, of course, is also possible in Python). List comprehension is a construct that allows you to easily create a new list as a result of some processing on the elements of another.

A highlevel walk-through of the function: first the NUnit executable is invoked one target at a time. The Run function itself returns a list of the form [exitcode, (command, output)]; hence with n targets to test, the commands and outputs variables end up containing n commands and their output. The Failures variable counts the total number of failures accumulated across all targets, extracted as a list of strings matching the regular expression and summarized by the Sigma function.

Python doesn’t allow summing over a list of strings using its build-in Sum function. However, you can easily implement a Sum function that does in something along the lines of:

def Sigma(list):
    return reduce(lambda a, b: int(a) + int(b), list)

where reduce is a function present in most, if not all, FP languages, which reduces a list of values to a single value by applying a user-defined operation to pairwise values of the list.

Finally, the zip function is a build-in function, merging the elements of two lists into one.

Another important construct in FP is the lambda notation. It enables you to create an anonymous function, a function without an explicitly name, leading us to another key characteristic of FP: functions are treated as first-class citizens, in the sense that they may be passed around as just another variable, e.g., you can create a list of functions with parameters and then do the actual invocation at a later stage.

Using deferred invocation, the build tool manages the overall build process; in a config file, similar to a makefile, a list of lambdas are defined, each defining a function that executes a task. This provides us with a generic way of annotating a task and for each task to be run by a central runner, which takes care of logging description, command, and outcome to a html report. The prefs used as parameters to each task is an associated array of associated arrays of … well as many levels as you require.

In effect the config file defines a domain specific language, using regular Python syntax, which can easily be loaded into what corresponds to the make utility by an import statement:

prefs = {
          "cvs" :
          {
            "cvs" : "C:\\Program Files\\cvsnt\\cvs.exe",
            "workingDirectory" : "C:\\Inetpub\\wwwroot",
            "root" : ":pserver:foo:bar@1.2.3.4:/foobar",
            "module" : "baz",
            "revision" : "HEAD"
          },
          "nunit" :
          {
            "nunitconsole" : "C:\\...\\nunit-console.exe",
            "projectsOrAssemblies" : [C:\\Inetpub\\...\\foobar.nunit" ]
          },
          ...
        }

tasks = [ ("Checkout Foobar from cvs",
           lambda: Task.CvsCheckout(prefs["cvs"]["cvs"],
                                    prefs["cvs"]["workingDirectory"],
                                    prefs["cvs"]["root"],
                                    prefs["cvs"]["module"],
                                    prefs["cvs"]["revision"])),
          ...,
          ("Running NUnit",
           lambda: Task.NUnit(prefs["nunit"]["nunitconsole"],
                              prefs["nunit"]["projectsOrAssemblies"]))
        ]

This example, using the basic concepts of FP such as lists, list comprehension, tuples, build-in functions, and nice syntax, hopefully provided you with a glimpse of how a lot can be achieved in relatively few lines of Python code. Also, if you really want to explore FP from a Python perspective, I suggest you take a look at the Xoltar Toolkit (unfortunately, though, it isn’t under active development), described in David Mertz’s columns More Functional Programming in Python and Even More Functional Programming in Python

Bottom line is that the use of FP style in your programs, although it takes time getting use to, leads to fewer lines of code, which leads to fewer places where things can go wrong.

This is not to say that your entire program should be written using the FP style (as is the case with Haskell). However, learning the basic concepts, and how Python exposes these to the developer, you’ll know which patterns to look for in your code that may be better solved using FP constructs.

  • Share/Bookmark

Tags: , ,
Posted in Python | No Comments »