Bugfree.dk – Ronnie Holm's blog

Not anti-anything, just pro-quality

Computer activation with Asp.Net MVC

Posted by Ronnie Holm on 22nd October 2008

Download source or watch demo.

(Other posts in this series include: Kernel space traffic shaping with Linux and User space traffic shaping with Ruby that touch exclusively on Linux issues.)

This post covers an Asp.Net MVC application (for an overview of Asp.Net MVC, the Herding Code guys recently interviewed Phil Haack) I wrote that automates the collection of information about computers and their owners on a local area network. The idea is to have the Internet gateway match the identity of each computer that wants to send traffic through it against a white list. Should the request originate from an unknown computer, the gateway redirects that computer to a web application for Internet access activation. Each user then has to create an account and provide verifiable contact information and/or associate their computer with an existing account.

With the contact information of every connected user, maintaining a network with 331 connected apartments, 371 users, and 513 privately owned computers becomes more manageable. Now you can easily email all users about general issues or individual users about specific issues, such as wireless routers exposing rogue DHCP servers or virus slowing down the network.

Without being able to contact a user, the best you can hope for is to use the MAC address to locate and disable the port on the switch to which the user is connected. But terminating network access without warning or explanation, the user has no way of knowing what hit him. Now when the user contacts you, figuring out if he’s experiencing network issues because of a closed port or something entirely different quickly becomes a challenge — and something that doesn’t scale well.

That’s where the Asp.Net MVC application comes in. To get a feel for how the software accomplishes its task, you should watch the minute and a half demo. It portrays the browsing experience of a user connecting his computer to the network for the first time and attempting to browse the web. Behind the scenes the Linux gateway uses its Netfilter capabilities to efficiently match the MAC address of each request against the white list. With no match for the computer just connected, upon browsing the web, Netfilter redirect the user’s browser to the web application. Then, every so often, using a Ruby script, the gateway queries the web server for updates to the white list and carries over the changes to Netfilter.

Database schema

The web application is based on the idea that a user is responsible for zero or more computers. Focusing on the web application, rather than the Linux part, the rest of this post highlights a few features that use and manipulate the data structures stored in a MS SQL Server Express database:

Reverse lookup of MAC

Establishing an ownership between a user and a computer is done via the MAC address of the computer. Now, it wouldn’t be particularly user friendly if the user had to go about locating the MAC address and manually typing it into a web form. Instead, the web application looks up the user’s MAC address within the ARP cache of the web server.

Because the user’s computer and the web server are both on the same local network and because they’re already exchanging data (the user is visiting a web page hosted on the server), the parties are effectively communicating through the ARP protocol of the data link layer. The data entry form is therefore able to present the user with a control that makes associating computers to the account easy. By reverse lookup, the computer currently browsing the web application is indicated by “Now logged in from here”.

Email verification

Most users don’t mind handing over basic information such as their email address. And as users are blocked from accessing the Internet, at the time of entry we only validate the email address for syntactic correctness.

A few users, however, repeatedly entered fake email addresses to gain Internet access. Consequently, we send out an email to the address, leaving open a 24 hour window for the user to follow a link. Otherwise, the software sends out a reminder email, marks the account as inactive, and redirects the user to the web application.

Validating the syntax of email addresses is done using regular expressions. As with most complex regular expressions, they’re hard to read and verify the correctness of. They do, however, stand the test against a database of hundreds of email addresses. To come full circle, though, what is needed is a way to match the domain part of the email address against the DNS MX record of the domain. Unfortunately, MX lookup isn’t build into the .Net framework.

Unhalted exception emailing error handler

Error handling deserves a post on its own. Suffice it to say that to stay on top of any application, unhalted exceptions should be logged, or in this case, emailed to a designated address. Inevitably, users of your software will use it in unanticipated ways and so a global exception handler provides invaluable insight to learn from.


(Summary part of example email. Click here for complete output.)

Download and maybe try out

The web application was developed around February 2008; about the time the Asp.Net MVC framework went into Preview 2 and had started gaining momentum. Unfortunately, developing against this early a release, now the web application doesn’t run on a computer with .Net 3.5 SP1 installed.

There’s no problem compiling the application because the Preview 2 bits are in the bin folder of the application. But running the application generates an exception stating:

   Could not load type 'System.Web.HttpContextWrapper2' from
   assembly 'System.Web.Abstractions' [located in the GAC]

That’s because the version of System.Web.Abstractions.dll that ships with .Net 3.5 SP1 no longer holds the HttpContextWrapper2 class. On a machine without .Net 3.5 SP1, however, Cassini runs the application just fine. Before running it, though, remember to create a database from the schema in Database.sql and point to that database from web.config.

Lastly, I should stress that the software is a prototype and served as a way for me to wrap my head around ASP.Net MVC and LINQ. So, the code may not win a beauty contest.

Conclusion

The activation software went online late March 2008. During the first couple of weeks I had to put in a couple of bug fixes based on what I learned from the unhalted exception emails. Since then, however, the software has served the purpose it was charged with.

The only issue that I haven’t been able to resolve is why, on rare occasions, redirecting a computer confuses its browser: if you visit a page and is redirected to the web application, for a short time after, the page keeps resolving to the web application. Most likely it’s a residual effect of the packet rewriting taking place on the gateway.

Another thing I wish I’d implemented was tracing to better understand how a couple of users got to experience a few exceptions that I don’t know how to interpret from the unhalted exception emails alone.

Share

Tags: , ,
Posted in .Net, Linux | 1 Comment »

User space traffic shaping with Ruby

Posted by Ronnie Holm on 12th April 2008

Download Netwatch-1.0.zip.

In my Kernel space traffic shaping with Linux post, I came to the conclusion that none of the traffic shaping algorithms within the Linux kernel was suitable for my needs. The effects of running the traffic shaping algorithms were too hard to quantify and coming up with the right set of parameters to go with each algorithm was challenging.

So I decided to come up with my own shaping algorithm, running in user space because it’s simpler working from there. I also wanted to use Ruby to learn the language and because of Ruby’s good properties as an integration platform. Lastly, I wanted the shaper to perform deferred shaping based on the traffic patterns observed over a period of hours or days rather than the more or less immediate shaping carried out by the kernel-based algorithms.

Before getting into the details, I should mention that these concepts have been successfully applied to shaping traffic on our 100/100 mbps Internet connection, shared by some 330 apartments, for well over a year. We’re no longer experiencing issues with network congestion and now that we use a payload agnostic shaper, we no longer have to combat the ever more sophisticated attempts of P2P software to camouflage its traffic.

Architectural overview

At an overview level the shaper is composed into a number of subsystems. At the top are the WRR scheduler, the ARP cache, and the RRD database system that retrieve and store metadata information about computers and their traffic.

shaperoverview.png

Based on input from these subsystems, the shaper’s decision engine evaluates each computer’s bandwidth usage against a set of rules. Should at least one rule be violated, e.g., too much traffic over some defined period of time, the shaper calls out to another subsystem that determines how to handle the violation. In this case Netfilter is called upon to take action, blocking the computer from accessing the Internet and redirecting it to an information page.

Weighted Round Robin scheduler

Within the shaper, kernel and user space form a symbiosis through the WRR scheduler. As part of WRR’s inner workings, the scheduler counts the bytes transmitted on a per IP basis. So although we don’t use WRR for shaping, per say, we do use it to track the byte counters of each computer.

Getting WRR to reveal this information is done through the tc (for traffic control) command. As outlined in the Kernel space traffic shaping with Linux post, only outgoing traffic can be shaped by WRR (any algorithm really). Hence, tc is called once for eth0 and once for eth1, and parsing the result, we know have much traffic has entered and exited each computer between this and the previous call to tc.

For each computer the output has the form below. Of particular interest are the address and the bytes fields:

> tc class show dev eth0
class wrr 8001:1fb parent 8001:
  (address: 192.168.1.231)
  (total weight: 0.872749) (current position: 4) (counters: 1 2 : 3 4)
  Pars 1: (weight 0.872749) (decr: 1e-10) (incr: 7.5e-11) (min: 0.1) (max: 1)
  Pars 2: (weight 1) (decr: 0) (incr: 0) (min: 1) (max: 1)
  (bytes: 4546184) (packets: 55373)
...

The address is dynamically assigned through DHCP and is therefore subject to change. Also, the byte counters aren’t retained across restarts, so we need to draw on additional subsystems to align the WRR output with a computer’s unique identity across time.

Address Resolution Protocol

In a DHCP based environment the IP address of a machine may change over time. So to ensure that traffic is always attributed to the correct physical machine, the WRR byte counters aren’t tied directly to the IP address when stored. Instead, we use the ARP cache to look up the corresponding MAC address, which is assumed to be static.

This lookup is done by maintaining an in-memory set of IP/MAC mappings, populated by parsing the output of the ip command:

> ip neigh
192.168.1.231 dev eth1 lladdr 00:50:8d:68:50:75 REACHABLE
192.168.3.117 dev eth1 lladdr 00:11:d8:8f:0e:3b REACHABLE
...

Thus, combining WRR with ARP, the shaper is able to associate byte counters with the MAC addresses of LAN-connected computers. Restarting the machine, the shaper, or Netfilter, however, will still cause traffic shaping data accumulated over time to be lost.

Round Robin Databases

We could’ve opted for persistence to a text file, a hierarchical XML structure, or a relational database, but time series data doesn’t lend itself well to these traditional approaches — at least not without preprocessing. Because with time series data, such as the 32 bit integer byte counters for each computer, counter-wrap is a frequent event (occurs every 4GB of transferred data). And restarting the machine, the shaper, or Netfilter is also a common event that’ll most likely cause an outlier to be recorded because all counters are reset.

Logic for making sure these events doesn’t pollute our database with erroneous measurements are part of the defining characteristics of a time series database system. In addition, querying data, such as summing within a period of time and making sure the sum isn’t affected by the above events, is what a time series database is good at.

On Linux-based based systems, Tobias Oetiker’s Round Robin Database tools are the de facto tools for storing, querying, and graphing time series data, and therefore the ones used by the shaper.

The idea is that, using the RRD tools, each computer gets its own database, named after the corresponding MAC address, describing its traffic over time. So querying the database of each computer can tell us how much data was transferred and received over some period of time. Using the RRD tools for this task eliminating the need on our behalf to deal with outliers, missing values, counter-wraps, and so forth. All the shaper has to do is record the value of the counters at regular intervals and RRD makes sure data is consistent within the database.

Decision engine

To put the shaper online, it’s run from a Bash script containing an infinite loop that (1) reads and parses the WRR output, (2) reads and parses the ARP cache entries, (3) writes the byte counters to the RRD databases, (4) uses the RRD querying tool to sum the data based on the rules specified, possibly causing a violation event to fire, and finally (5) goto sleep for some period of time before starting all over.

Whenever a rule is violated, the action taken may be whatever can be expressed though Ruby code or through a callout. This may involve sending an email to the network administrator or disabling Internet access for the computer in question.

Within our configuration, we defined a set of rules that state that within a four hour sliding window a computer is allowed to upload no more than 5GB and download 10GB of data. Similarly, during a seven day sliding window, a computer is allowed to upload no more than 30GB and download 60GB of data (the exact quotas and periods obviously depend on the network capacity, users online, their usage patterns, and so forth).

Upon violation of a rule, we employ Netfilter to redirect a computer to an information page stating that the computer is blocked from accessing the Internet in conjunction with the sum of the four hour and seven day totals.

Conclusion

Observing the proof of concept shaper in action, the biggest problem seems to be that a few users modify their MAC address to get a bigger piece of the bandwidth pie. If we wanted, though, changing the MAC can be counteracted by introducing another layer, and instead tie traffic to the port on the switch to which the computer is connected.

As far as the available source code goes, it should probably only be used as a starting point for building your own system.

Share

Tags: , ,
Posted in Linux, Ruby | 5 Comments »

Kernel space traffic shaping with Linux

Posted by Ronnie Holm on 18th November 2007

Where I live, we have a Linux box sitting between the Internet and the local area network, providing Internet access to some 330 apartments.

With this many users, and even more computers, access to the Internet through our 30/30 mbit/s connection quickly turned bandwidth into a scarce resource. Not so much because users surf the web or check their email, but because P2P clients are in common use. And with this kind of software no amount of bandwidth can really satisfy its need, calling for a way to distribute bandwidth between users or computers that is fairer than the first come, first served one.

For some time we counteracted the effect of P2P clients on our bandwidth by using ipp2p on top of Netfilter. Unfortunately, popular P2P clients are able to sneak below the ipp2p radar using HTTP as their transport and obfuscating their traffic.

Thus, we turned our attention to the queuing disciplines of the Linux Advanced Routing & Traffic Control Howto. The idea behind a queuing discipline, or qdisc for short, is to apply some processing on the queue of packets in the kernel waiting to be sent (either from the network interface on the local area network side to the network interface on the Internet side or vice versa). Generally speaking, processing involves moving around packets in the sent queue to allow for some packets to go out on the wire before others, based on the algorithm of the qdisc.

The net effect is that users whose packets get moved to the front of the queue will experience a lower latency, higher bandwidth connection. Conversely, owners of packets in the back of the queue may have their packets delayed to the point where their TCP/IP stack is forced to decrease the speed with which data is sent, simply because the receiver reports that not all packets arrived on schedule.

Experimenting with the simpler qdiscs, such as Token Bucket Filter (TBF), Stochastic Fairness Queuing (SFQ), and Class Based Queuing (CBQ), we found quantifying their effects on the bandwidth consumption hard. Partly because these qdiscs aren’t intended for shaping individual computers, but rather a group of computers sharing some network usage characteristic. Sure, SFQ shapes each connection, but a computer may have any number of open connections, so shaping each connection independently is no good at limiting P2P traffic.

Therefore, we turned our attention to the Weighted Round Robin qdisc (which requires kernel patching and compilation). As opposed to the other qdiscs, WRR has the ability to shape traffic from individual computers by creating a CBQ for each one. Applying WRR to the sent queue of each network interface, WRR will assign an inbound and an outbound weight to each computer. Furthermore, the weight is adjusted as a function of the amount of data transmitted within some quantum of time. Then, when the demand for bandwidth exceeds what’s available, the computer with the highest weight gets to go first.

On paper this is a great idea, but like with the other qdiscs, we found it hard to balance the various parameters and measure the net effect.

In conclusion, a great deal of time went into experimenting with the various qdiscs, even in combination with ipp2p. But eventually we decided that none of the qdiscs were up to solving our network congestion problem. The research effort wasn’t all in vain, though, because we found that WRR provides us with a cost effective way of counting incoming and outgoing bytes on a computer by computer basis. Thus, in a way all this helped crystallize the idea of building a payload agnostic, user space traffic shaper in Ruby.

Share

Tags: ,
Posted in Linux | 3 Comments »

Tapping the power of PlayStation 3

Posted by Ronnie Holm on 15th April 2007

I recently had the fortunate opportunity to spend a few days with the PlayStation 3 (PS3). Its hardware configuration is truly amazing: a PowerPC based Cell processor with eight cores, although only seven is in active use to decrease the number of faulty units coming off the assembly line, running at 3.2 GHz and an NVidia based graphics card.

Of course, all that power is there to be tapped primarily by games. However, with the firmware upgrade to version 1.6, Sony added a menu item for easily joining the Folding@home distributed computing project: a project that does molecular simulation across hundreds of thousands of computers in the hope of identifying the underlying causes of Alzheimer’s, Parkinson’s, and cancer, etc.

Assuming the PS3 has network access (it comes with build-in Wi-Fi, by the way), selecting the menu item causes the console to go download the Folding@home client and start processing. That way, when you’re not using the console for gaming, you can donate the spare cycles to something useful. At least to me, participating in the Folding@home project is more meaningful than, say, aiding the search for extraterrestrial intelligence.

According to the Folding@home FAQ, the PS3 has been benchmarked at 10x the speed of a regular computer. The downside, however, is that “… the PS3 [client] takes the middle ground between GPU’s (extreme speed, but at limited types of WU’s [work units]) and CPU’s (less speed, but more flexibility in types of WU’s)”. Perhaps because the processor “… lack most of the general-purpose features that you normally expect in a processor”, as described here.

Another interesting aspect of the PS3 is the support for running other operating systems, particularly Linux. From the menu, you can repartition the standard 60 GB SATA hard disk. The operating system of the PS3 is run from internal memory for faster boot times and it’s only using the disk for storing game state and other data not vital for booting the PS3 kernel. Thus, installing Linux, you shouldn’t have to worry about accidentally wiping out the factory operating system.

Installing Linux on the thing is still something left to try. But it seems like buying a USB mouse and keyboard could transform the PS3 into a regular desktop computer, in turn making the price more digestible.

Update, September: I successfully downloaded and installed Yellow Dog Linux on a PS3. Following the installation guidelines it went surprisingly smooth. After installing, you can add the Fedora software archives to the list of possible installation sources for Yum. This’ll make even more software available without having to compile it yourself. Personally, the most used application is the VLC media player, which, at a mare double click, is able to play just about any video format in full screen mode.

Share

Tags: ,
Posted in Uncategorized | Comments Off