Functional Testing, BDD and domain-specific languages

I love Test Driven Development (TDD). If you look back through the posts on this blog that soon becomes apparent. I’m pretty comfortable with using TDD techniques at all levels of a solution, from the tiniest code snippet to multiply-redundant collaborating systems. Of course, the difficulty of actually coding the tests in a test-driven design can vary widely based on the context and complexity of the interaction being tested, and everyone has a threshold at which they decide that the effort is not worth it.

The big difference between testing small code snippets and large systems is the combinatorial growth of test cases. To fully test at a higher (or perhaps more appropriately, outer) layer would require an effectively infinite number of tests. By a happy co-incidence, though, if you have good test coverage at lower/inner layers of the code, you can rely on them to do their job, so it’s only needed to test the additional behaviour at each layer. Even with this simplifying assumption the problem does not fully go away. Very often the nature of outer layers implies a broader range of inputs and outputs, and a greater dependence on internal state from previous actions. It can still seem to need a ridiculous amount of tests to properly cover a complex application. And worst of all, these tests are often boring. Slogging through acres of largely similar sequences, differing only in small details, can be a real turn off for developers, so once again, it can feel that the effort is not worth it.

If fully covering the outer layers of a system, with tests for every combination of data, every sequence of actions, and every broken configuration is prohibitively expensive, time-consuming and downright boring, then it makes sense to be smart about it, and prioritise those tests which have the highest value. Value in this sense is a loose term encompassing cost of failures, importance of features, and scale of use. A scenario with a high business value if successful, a high cost of failure, which is used often by large numbers of people would be an obvious choice for an outer layer test. Something of little value which nobody cares about and which is hardly ever used would be much further down the list.

And this leads neatly to the concept of automated “functional testing”. Functional tests being outer layer tests which exercise these important, valuable interactions with the system as a whole, Arguably there is a qualitative difference between unit tests of internal components and functional tests of outer layers. Internal components have an internal function, and relate mostly with other internal components and their interactions are designed by developers for code purposes. This makes it relatively easy for developers to decide what to test, so specifying tests in similar language to the implementation code is a straightforward and effective way to get the tests written. Outer layers and whole applications have an external function, and business drivers for what they do and how they do it. This can be less amenable to describing tests in the language used by programmers. Add to this the potentially tedious nature of these external, functional, tests and its easy to see why they sometimes get overlooked, despite their potentially high business value.

Many attempts have been made over the years to come up with ways to get users and business experts involved in writing and maintaining external functional test cases. My friend, some-time colleague and agile expert Steve Cresswell has recently blogged about a comparison of Behaviour-Driven Design (BDD) tools. It’s an interesting article, but I can’t help thinking that there is another important dimension to these tools which also needs to be unpicked.

Along this dimension, testing tools range from “raw program code” (with no assistance from a framework); through “library-style” frameworks which just add some extra features to a programming language using its existing extension mechanisms; “internal DSL” (Domain Specific Language) frameworks which use language meta-programming features to re-work the language into something more suitable for expressing business test cases; “external languages” which are parsed (and typically interpreted, but compilation is also an option) by some specialist software; through to “non-textual” tools where the test specification is stored and managed in another way, for example by recording activity, or entering details in a spreadsheet.

In my experience, most “TDD” tools (JUnit and the like) sit comfortably in the “library-style” group. test case specification is done in a general-purpose programming language, with additions to help with common test activities such as assertions and collecting test statistics. Likewise, most “BDD” tools (Cucumber and the like) are largely in the “external language” group. This is slightly complicated by the need to drop “down” to a general-purpose language for the details of particular actions and assertions.

Aside from the “raw program code” option which, by definition, has no frameworks, the great bulk of test tools occupy the “library style” and “external language” groups, with a small but significant number of tools in the “non-textual” group. I find it somewhat surprising how few there seem to be in the “internal DSL” category, especially given how popular internal DSL approaches are for other domains such as web applications. There are some, of course, (coulda, for example) which claim to be internal DSLs for BDD-style testing, but there is still a more subtle issue with these and with many external languages.

The biggest problem I have with most BDD test frameworks is related to the concept of “comments” in programming languages. Once upon a time, when I was new to programming, it was widely assumed that adding comments to code was very important. I recall several courses which stressed this so much that a significant point of the mark was dependent on comments. Student (and by implication, junior developer) code would sometimes contain more comments than executable code, just to be on the safe side. Over the years it seems that the prevailing wisdom has changed. Sure, there are still books and courses which emphasise comments, but many experienced developers try to avoid the need for comments wherever possible, using techniques such as extracting blocks of code to named methods and grouping free-floating variables into semantically meaningful data structures, as well as brute-force approaches such as just deleting comments.

The move away from comments has developed gradually, as more and more programmers have found themselves working on code produced by the above mentioned comment-happy processes. The more you do this, the more you realise that the natural churn and change of code has an unpleasant effect on comments. By their nature comments are not part of the executable code (I am specifically excluding “parsed comments” which function like hints to a compiler or interpreter here). This in turn means that comments can not be tested, and thus cannot automatically be protected against errors and regressions. Add to this the common pressure to get the code working as quickly and efficiently as possible, you can see that comments often go unchanged even when the code being commented is radically different. This effect then snowballs – the more that comments get out of step with the code, the less value they provide, and so the less effort is made to update them. Soon enough most (or all!) comments are at best a useless waste of space, and at worst dangerously misleading.

What does this have to do with BDD languages? If you look in detail at examples of statements in many BDD languages, they have a general form something like keyword "some literal string or regular expression" Typical keywords are things like “Given”, “When”, “Then”, “And”, and so on. For example: Given a payment server with support for Visa and Mastercard. This seems lovely and friendly, phased in business terms. But let’s dig into how this is commonly implemented. Very likely, somewhere in the code will be a “step” definition.

An example from spinach, which is written in Ruby, might be:

  step 'a payment server with support for Visa and Mastercard' do
    @server = deploy_server(:payment, 8099)
    @server.add_module(@@modules[:visa])
    @server.add_module(@@modules[:mc])
    @server.start
  end

This also looks lovely, linking the business terminology to concrete program actions in a nice simple manner. However, this apparent simplicity hides the fact that the textual step name has no actual relationship to the concrete code. Sure, the text is used in at least two places, so it seems as if the system is on the case to prevent typos and accidental mis-edits, but it still says nothing about what the step code actually does. As an extreme example, suppose we changed the step code to be:

  step 'a payment server with support for Visa and Mastercard' do
    Dir.foreach('/') {|f| File.delete(f) if f != '.' && f != '..'}
  end

The test specifications in business language would still be sensible, but running the tests would potentially delete a whole heap of files!

For a less extreme example, imagine that the system has grown a broad BDD test suite, with many cases which use “Given a payment server with support for Visa and Mastercard”. Now we need to add support for PayPal. There are several options including:

  • copy the existing step to a new one with a different name and an extra line to install a PayPal module (including going through all the test code to decide which tests should use the old step and which should use the new one
  • add the extra line to the existing step and modify the name to include PayPal then change all the references to the new name
  • or just add the extra module and leave the step name the same.

The last option happens more often than you might think. Just as with the comments, the BDD step name is not part of the test code, and is not itself tested, so there is nothing in the system to keep it in step with the implementation. And the further this goes, the less the business language of the test specifications can be trusted.

I have worked on a few projects which used these BDD approaches, and all of them fell foul of this problem to some degree. Once this “rot” takes hold, it seems almost inevitable that the BDD specifications either become just another set of “unit” tests, only understood by developers who can dig into the steps to work out what is really happening, or they are progressively abandoned. It seems unlikely that many projects would be willing to take on the costs of going through a large test suite and checking a bunch arbitrary text that’s not part of the deliverable, just to see if it makes sense.

I wonder if the main reason that this is not seen to be more of a problem is that so few projects have progressed with BDD through several iterations of the system, and are still using the approach.

So, is there any way out of this trap? For some cases, using regular expressions for step name matching can help. This can provide a form of parameter-passing to steps, increasing reuse for multiple purposes and reducing churn in the test specifications. This does not solve the overall problem, though, as it still has chunks of un-parsed literal text. For that we would need a more sweeping change.

Which brings me back to internal and external DSLs. To my mind, the only way to address this issue in the longer term is to define the test specifications in a much more flexible language, one which suits both the domain being tested and the domain of testing itself. My aim would be to avoid the need for those comment-like chunks of un-parsed literal text, and align the programming language with the business language well enough that expressions in the business language make sense as programming constructs. If done well, this should allow test specifications to be changed at a business level and still be a valid and correct expression of the desired actions and assertions.

Although some modern programming languages are relatively adept in metaprogramming and internal DSLs (Ruby is well-known for this), mostly they don’t go far enough. Arguably a better choice would be to base the DSL on one of the languages which have very little by way of their own syntax, and thus are more flexible in adapting to a domain. Languages such as LISP and FORTH have hardly any syntax, and have both been used for describing complex domains. It used to be said of FORTH that the task of programming in forth was mainly one of writing a language in which the solution is trivial to express.

I’m afraid I don’t have a final answer to this, though. I have no tool, language or framework to sell, just a hope that somebody, somewhere, is working on this kind of next-generation BDD approach.

For more information about low-syntax languages, you might want to read some of my articles on Raspberry Alpha Omega, such as What is a High Level Language and More Language Thoughts and Delimter-Free Languages.

Also, for Paul Marrington’s take on internal vs external DSLs for testing, see BDD – Use a DSL or natural language?.

Programmer Pairing with a Tester

James Bach’s Blog » Blog Archive » Programmer Pairing with a Tester.

The stupid cookie law is dead at last

Needs no comment from me…

The stupid cookie law is dead at last | Silktide blog.

This blog may be a bit quiet, I’m busy elsewhere

Sure, quiet is relative. Over the years I have gone through enthusiastic patches and months with nothing but the occasional scrap of a link. At the moment, though, the quietness here has a reason: I’m too busy having fun messing with software and hardware on my Raspberry Pi.

If you don’t know already, Raspberry Pi is a credit-card sized computer with an ARM core, 256MB of RAM (recent ones have 512MB), HDMI 1080p video, USB, SD Card and ethernet on board. It also has a bunch of general purpose I/O pins and only takes a couple of watts of power so it can even run from a USB lead . There’s a build of Debian Linux which runs on it so you can do stuff in Ruby, Python or whatever, and a growing collection of interesting hardware to plug in to it. People are using it for tiny network devices (media players running XBMC, for example), teaching children to program, and a whole host of embedded and robotics projects.

If you think that still sounds meh, then consider the price. You can buy a 512MB model right now for less than £40!.

I’m having great fun trying things out with mine, and blogging about it at http://raspberryalphaomega.org.uk/.

Charles Moore on Portability

Portability

Don’t try for platform portability. Most platform differences concern hardware interfaces. These are intrinsically different. Any attempt to make them appear the same achieves the lowest common denominator. That is, ignores the features that made the hardware attractive in the first place.

Achieve portability by factoring out code that is identical. Accept that different systems will be different.

via ‘http://www.colorforth.com/binding.html.

Tracking configuration changes in Jenkins

Continuous Integration is a pretty common concept these days. The idea of a “robot buddy” which builds and runs a bunch of tests across a whole codebase every time a change is checked in to the source code repository seems a generally good idea. There are a range of possibilities how to achieve this, and one of the most popular is Jenkins, the open source version of Hudson, originally a community project but now owned by Oracle.

Although Jenkins is a useful tool, it can be a bit fiddly to set up, with a lot of web-form-filling to configure it for your projects once the basic web app is installed. Most projects I have worked on which use Jenkins have started by working through this setup using trial and error, then stuck with something which seemed to work. In an optimistic world this is OK, but it fails whenever there is a problem with the machine running the CI service, if another one is required, or if someone makes a change which breaks the configuration and the team needs to quickly roll back to a working config. In short it’s like software development used to be before we all used version control and unit tests!

With all this in mind, I was interested to read an article about a Jenkins plugin for version-controlling the configuration. This could be a step in the right direction.

Tracking configuration changes in Jenkins – cburgmer's posterous.

If you use Jenkins this seems a very good thing to try out. I still feel, though, that all this configuration does not really belong in a CI tool. It’s really just some instructions about how how to build, deploy, and test the product, and to my mind that’s the kind of stuff which belongs with the code itself.

Imagine how simple things would be if there was no need for a whole CI “server”, but we could just use a simple git hook which called something managed, edited, tested and checked in with the code itself.

Kent Beck on incremental degradation (“defactoring”) as a design tool

Thanks to @AdamWhittingham for pointing out a great post from Kent Beck in which he suggests an “if you can’t make it better, make it worse” approach to incremental development. This is a habit that has been a part of my development process for a long while, and I have needed to explain it to mildly puzzled colleagues several times.

An analogy I have often used is that of Origami, the Japanese art of paper folding. The end results can be intricate and beautiful, but the individual steps often seem to go back on themselves, folding and unfolding, pulling and pushing the same section until it ends up in the precisely correct configuration. Just as folding and unfolding a piece of paper leaves a crease which affects how the paper folds in future, factoring code one way then defactoring it to another leaves memory and improved understanding in the team which affects how the code might be changed in the future.

Don’t be afraid to inline and unroll, what you learn may surprise you.

Baruco 2012: Micro-Service Architecture, by Fred George

A fascinating presentation from Barcelona Ruby Conference. Fred George talks through the history and examples of his thinking about system architectures composed of micro services.

I found this particularly interesting as it has so many resonances with systems I have designed and worked on, even addressing some of the tricky temporal issues which Fred has avoided.

Thanks to Steve Cresswell for pointing this one out to me.

Assembla on Premature Integration or: How we learned to stop worrying and ship software every day

An excellent article from Michael Chletsos and Titas Norkunas at Assembla, which reminded me how important it is to keep anything which might fail or need rework off the master branch.

It’s a truism about software development that you never know where the bugs will be until you find them. This can be a real problem if you find bugs in an integrated delivery, as it prevents the whole bunch from shipping. Assembla have some interesting stats about how they have been able to release code much more frequently by doing as much testing and development as possible on side-branches.

Read more at: Avoiding Premature Integration or: How we learned to stop worrying and ship software every day.

Do You Really Want to be Doing this When You’re 50?

I just read an article (Do You Really Want to be Doing this When Youre 50?) from James Hague, who describes himself as a “Recovering Programmer“.

I understand his experience, and his reasons for deciding that it’s not the job for him. I even like that he has blogged about it.

What I really don’t like, though, is the sweeping generalisation that this is a universally-applicable, age-related issue. Software development is already a field riddled with ageism from employers and management (see Silicon Valley’s Dark Secret: It’s All About Age from TechCrunch), and the last thing we need is an assumption that one person’s dissatisfaction with a particular job means that, in James Hague’s closing words, “large scale, high stress coding? I may have to admit that’s a young man’s game”

A programmer in his or her 40s or 50s who has kept up with the evolution of the field can be one of the most awesome and effective developers you’ll ever meet.

Experimenting with VMware CloudFoundry

Yesterday evening I went along to the Ipswich Ruby User Group, where Dan Higham gave an enthusiastic presentation about VMware CloudFoundry. The product looked interesting enough (and appropriate enough to my current project) that I decided to spend a few hours evaluating it. On the whole I’m impressed.

After poking around the web site a bit I decided to download the “micro cloud foundry” version, which does not need dedicated hardware, as it runs as a virtual machine on a development box. Once I had passed the first hurdle of requiring registration before even starting a download, then waited for about 40 minutes for the VM image to download, I had a go at running it up.

Now, I must say that I went a bit “off road” at this point. Naturally enough the web site recommends only VMWare virtual containers for the task, but the last time I used VMWare player (admittedly a few years ago) I found it clumsy and intrusive, and did not want to clutter my relatively clean development box. I already have virtualbox installed, and use it every day, so I thought I’d see how well the image runs in this container. Starting the VM was just a matter of telling virtualbox about the disk image, allocating some memory (I have a fair amount on this machine, so I gave it 4G to start with) and kicking it off.

Initially, everything seemed great. The VM started OK, and presented a series of menus pretty much as described in the “getting started guide”. I had been warned that setting it up might take a while, so I was not worried that it twiddled around for 15 minutes or so before telling me that it was ready. The next step according to the guide was to go to the system where the application source code is developed and enter vmc target http://api.YOURMICROCLOUDNAME.cloudfoundry.me (the micro cloud foundry VM has no command line, it’s all remotely administered) to connect the management client to the VM. This was a bit of a gotcha, as the “vmc” command needs a separate installation, found elsewhere on the site. Essentially “vmc” is a ruby tool, installed as a gem. In this case I already had ruby installed (it’s a ruby project I’m working on), so it was just a matter of sudo gem install vmc. Once I had installed vmc, I tried to use it to set the target as suggested, but the request just got rejected a somewhat confusing error message. On the surface it did not appear to be a network issue – I could happily “ping” the pseudo-domain, but vmc would not connect. After some looking around, both on the web, and digging in the “advanced” menus of the micro cloud foundry VM, I eventually realized that the error message in question was not actually coming from the micro cloud foundry at all but from a server running on my development system!

To make sense of why, I need to describe a bit about my development set up. The basic hardware is a generic Dell dektop with its supplied Windows 7 64-bit OS. I don’t particularly like using Windows for development (and did not want to wipe the machine, because I also need to use Windows-only software for tasks such as video and audio production), so I do all my development on one of a selection of virtual Ubuntu machines running on virtualbox. This is great in so many ways. I have VM images optimised for different work profiles, and run them from a SSD for speed. Best of all, I can save the virtual machine state (all my running servers, open windows and whatnot) when I stop work, and even eject the SSD and take it to another machine to carry on if needs be.

So, the problem I was seeing was due an interaction between the “clever” way that micro cloud foundry sets up a global dynamic dns for the VM, and the default virtualbox network settings. To cut a long story short, both my development VM and the micro cloud foundry VM were running in the same virtualbox, and both using the default “NAT” setting for the network adapter. Somewhat oddly, virtualbox gives all its VM images the same IP-address, and all the incoming packets were going to the development VM. More poking around the web, and I found that a solution is to set up two network adapters in virtualbox for the micro cloud foundry. Set the first one to “bridge” mode, so it gets a sensible IP address and can receive its own incoming packets, and set the second one to NAT, so it can make requests out to the internet. I left the development VM with just a “NAT” connector, and it seems happy to connect to both the web and to the micro cloud foundry VM via the dynamic dns lookup.

Of course, it was not all plain sailing from there, though. The first issue was that I kepy getting VCAP ROUTER: 404 - DESTINATION NOT FOUND as a reponse. A message that was obviously coming from somewhere in cloud foundry, but gave no obvious hint what was wrong. After a lot of trying stuff and searching VMware support, FAQ and Stack Overflow, I came to the conclusion that this is largely an intermittent problem. After a while things just seemed to work better. My guess is that when the micro cloud foundry VM first starts it tries to load dependencies and apply updates in the background. This is probably a quick process inside VMware’s own network, but out here at the end of a wet bit of string, things take a while to download. Eventually, though, things settled down and I was able to deploy some of the simple examples. Hooray! I have subsequently found that the micro cloud foundry VM needs a few tens of minutes to settle down in a similar way every time it is started from cold. Good job I can pause the virtual machine in virtualbox.

The process for deploying (and re-deploying) applications which use supported languages and frameworks is largely smooth and pleasant. It does not use version control (like Heroku, for example) but a specific set of tools which deploy from a checked-out workspace. If you want to deploy direct from VCS, it’s easy enough to attach a a little deploy script to a hook, though.

Once I got past paying with the the examples, I tried to deploy one of my own apps. It’s written in Ruby, uses Sinatra as a web framework and Bundler for dependency management, so it should be supported. But it does not work at all on cloud foundry. It works fine when I run “rackup” on my development box, and it works fine when I deploy it to Dreamhost, but on cloud foundry – nothing. Now, I can understand that there may be all sorts of reasons why it might not work (the apparent lack of a file system on the cloud foundry deployment, for one), but my big problem is that I have so far not discovered any way of finding what is actually wrong. An HTTP request just gives “not found” (no specific errors, stack traces or anything useful). Typing vmc logs MYAPP correctly shows the bundled gems being loaded, and the incoming HTTP requests reaching WEBrick, but no errors or other diagnostic output. I can only assume that the auto-configuration for a Sinatra app has not worked for my app, but there seems to be no way of finding out why.

To me, this lack of debuggability is the single biggest problem with cloud foundry. I hope it is just that I have not found out how to do it. If there is really no way at all of finding out what is going on on the virtual server we are back to “suck it and see” guesswork, which is so bad as to be unusable. I am simply not willing to spend hours (days? weeks?) changing random bits of my code and re-deploying to see if anything works.

If anyone reading this knows a way to find out what cloud foundry expects from a Sinatra app, and how to get it to tell me what is going on, please let me know. If not, I may have to abandon using cloud foundry for this project, and that would be a real shame.

The 2012 JavaZone video is out and it’s absolutely brilliant

The JavaZone conference has a reputation for the quality and cleverness of its promotional videos, but this year’s takes it to a whole new level.

Don’t watch this if you are offended by some (OK, quite a lot of) swearing. It may be coarse, but it’s very much in keeping with the style they have chosen.

4 minutes of pure movie deliciousness.

JavaZone 2012: The Java Heist

Gojko Adzic – BDD: Busting the myths

I just watched an interesting and entertaining talk about behavior-driven development (BDD or ATDD). Gojko Adzic – BDD: Busting the myths on Vimeo on Vimeo

Well worth a look, particularly if you are worried that BDD or automated acceptance tests do not make sense, or are not helping your business.

Thinking of homeschooling?

Peter Kim of CollegeAtHome.com has put together a neat infographic highlighting the differences between home schooling and the public school system in the USA. Seems pretty convincing. I wonder what the equivalent numbers might be here in the UK?

Homeschool Domination

Graphic included with permission.

This Is All Your App Is: a Collection of Tiny Details

A nice analysis of the effect of detail choices on overall usability (of cat feeders in this case) from Jeff Atwood.

Coding Horror: This Is All Your App Is: a Collection of Tiny Details.

Want Car Wars On Kickstarter?

Car Wars small box cover As some of you may know, I have a soft spot for games – tabletop games rather than computer games, mostly. One of my all time favourites is Car Wars from Steve Jackson Games. Originally from the 1980 it hit the sweet spot of being both a fun battle game with your friends and a geeky challenge trying to come up with the best vehicle designs between games. It’s not a card game, but it beat the “deck building” collectible card game trend by over a decade.

It’s hard to get a good game of Car Wars these days. The most recent version, from 2002, had a lot of problems and was never fully supported, which turned off a lot of players. Happily, there may be light at the end of the tunnel for Car Wars fams.

Following the astonishing success of Steve Jackson’s crowd-funding of the “big box” revival of his classic game OGRE, Steve is considering working on a new version of Car Wars. If you join the OGRE kickstarter program to the tune of $23 (USA) or $30 (rest of the world) you can help us send a clear message that we want a Car Wars revival and a decent new version. Plus, you get an exclusive T-shirt that says so.

Read more at:

Daily Illuminator: Want Car Wars On Kickstarter?.

But hurry, there’s only a few days left!

Full disclosure. I am part of an unpaid team of keen game playing volunteers (“The Men in Black”) who go to conventions, clubs and stores to play and teach Steve Jackson games, but I also play a lot of other stuff too.

“Decisions, Decisions” a great presentation about software design

Following a recommendation from Jon Woods, I just checked out Decisions, Decisions, a recorded QCon presentation. Well worth watching, and thinking about, and putting into practice.

I don’t want to spoil it, as he has a fun, interactive, style of presentation, but if you have any experience in software development you will get his point very quickly, and it might just change some of the opinions you didn’t even know you had.

Bret Victor – Inventing on Principle

I just ran into a fascinating presentation giving a whole new way of thinking about software development, and careers in general. It also includes some eye-opening software demos.

Watching this video felt a lot like one of the better TED talks. Although it is a bit longer than typical TED talks, it is well worth watching all the way though.

Bret Victor – Inventing on Principle on Vimeo

via Bret Victor – Inventing on Principle on Vimeo.

Build pipelines with Jenkins

Continuous Integration is a great idea, and usually pretty simple to implement for simple projects. However, these simple projects don’t really exercise the “integration” aspect of the idea. As he build and test process for a project grows in complexity, it almost always grows in duration, too. Typical enterprise Java projects, for example, might fetch dependencies from maven repositories, compile several code modules, copy, move and transform various resources, run unit tests, assemble and deploy jar files, start servers and run integration tests, and so on. All of this can take quite a while, even on a fast build server.



(cartoon from the great xkcd)

One big problem with growing build times, is the effect it has on feedback. If a developer has to wait 10, 20, 30 minutes or more for a build cycle to complete before test results are available, it usually leads to one of three outcomes:

  • Every small change requires a concentration-breaking delay to see if it works before moving on to the next change. Development slows to a crawl, management cracks the whip and tries to ban casual web surfing, private email and facebook.
  • developers give up waiting for the CI results and press on with development anyway. The code base fills with bugs and issues. The CI process becomes largely irrelevant, as builds are almost always broken.
  • Developers hold off from checking in small code changes for fear of having to sit and wait for CI to catch up. As check-in size increases, so does the frequency of code clashes and the difficulty of merging different strands of work. Team culture shifts from collective ownership to silos and hoarding.

What’s needed is a way to get fast feedback, even when a full build takes a long time. Almost every team I have worked with in recent years has tried to achieve this, usually using the open source “Jenkins” (or its fork-parent “Hudson“) build server. So far this has never quite worked.

The main problem seems to be the monolithic nature of a Jenkins build. A build runs to completion (or to a fatal failure), accumulating build data and test results. Data and results are only available at the end. A more useful approach might be if build data and test results were made available as soon as possible, even while further build activity continues. Better still would be a way of adapting the build process to emphasise early feedback, preferring build steps which give feedback to those which are merely useful for further processing. That way a trivial compilation error or test failure in a stand-alone part of the code might give almost immediate feedback. This is not only useful because of the speed of feedback, but because of the effect it has on development habits. Faster feedback would come from code with less coupling and fewer dependencies – any developer wishing to progress more quickly would be automatically encouraged to write (or refactor towards) small, loosely-coupled, independent, well unit-tested, re-usable code.

Although I’m tempted to think that this kind of really effective continuous integration would best be based on different build software, there are a lot of people working to improve things with Jenkins. A recent blog post from “Antagonistic Pleiotropy“: Implementing a real build pipeline with Jenkins. looks interesting, but shows just how tricky even a relatively straightforward build pipeline can be to configure.

Has anyone got any better suggestions on how to achieve effective feedback while building complex systems?

TEDcember Day 08 – jaw-dropping poetry

I can honestly say that I was astonished by this. A continual, unrelenting, stream of rhythm and rhyme for over two minutes which weaves in and out of a fantasy scenario of mockingbirds as recording devices while making references to a slew of TED talks from the same conference.

Of all the talks I have watched so far, this is the only one I want to watch again, straight away. I can’t offer higher praise than that.