Literature and inspiration

I am currently in the process of putting together a draft literature review as part of my PhD research. Initially this is intended for my primary supervisor, as some indication of whether I am progressing in a reasonable direction, and at a reasonable pace. Once it has achieved acceptable quality and quantity, the next step is to include it as part of my “first year report” (a.k.a “annual progress review” or “probationary review”), and, all being well, as part of my final dissertation. This may seem a lot of weight to be carrying at a point just a few months in to a potentially six-year part-time research project, but arguably there is no better time to get started than the present.

Since my previous post on literature searching I have upped my game, and now have several hundred potentially useful papers, journal articles and peer-reviewed conference proceedings and a rough idea of how I think I might structure the literature review. Just to make sure I am on the right track, however, I have also been reading up on the concepts and processes of literature reviews. This has left me in more or less equal amounts enlightened and confused. For example, I have been working my way through “Succeeding with your Literature Review – a handbook for students” by Paul Oliver (OUP, 2012)

This book, notwithstanding the claims of the back cover, has a definite bias toward literature reviews in the social sciences and humanities, with only a cursory nod toward harder science or engineering. However, there have been several times when reading this book has provided a minor epiphany. The first such was in illuminating ideas of how to structure a literature review, particularly in pointing out that, despite to the name, a “literature review” is not just a review of some literature, but a context and positioning statement for some research. Even more than that it can serve as a narrative which takes the reader on a journey from a broad map of existing knowledge to fertile ground for a specific research question.

The latest gem came today, as I read, perhaps for the third or fourth time, an exposition on methodology. Up to this point, the terminology and assumptions of an unfamiliar field had left me unsure whether this had any significance for me. Repeated mention of words such as autoethngraphy and epistemology left me cold. But finally the realisation crept up on me that discussion of methodology should indeed be a key part of my literature review and that, more significantly, I had done very little literature searching in that area and thus have hardly anything to cite or quote.

My research area is largely about comparing implementations (and the process of the creating and selecting such implementations) of software intended to serve a broadly similar purpose. Yet in my rush to find writing about this specific kind of software I have neglected to look for literature on software comparison in general. Without a theoretical grounding in academic practice in this kind of study, I run the distinct risk of wasting time and effort on ill-considered, flawed, or unusable research. My hope now is to be able to find sources to help me describe and reference existing methodological approaches to this kind of problem, and start to build a compelling argument for my intended methodology within my literature review.

Searching Literature for Technical Key Texts

A literature review is a key part of postgraduate research. To start with I’m attempting a broad literature search to try and find anything I can which sheds light on my topic area. In particular I’m trying to locate some “key texts” which align fairly closely with my planned research area, and could help inform my attempts at narrowing down my search. Despite the power of search engines, databases and indexes, this is not as easy as it might seem, particularly when the terminology of the domain is not always consistent.

For example, my topic area is templated generation of text. In my small corner of software development that seems quite specific and it’s easy, though naive, to assume that all is required is typing this phrase into a search box to turn up everything I need. As anyone who has tried this will tell you, this is very rarely the case.

I typed this very phrase into the search box at University of Suffolk Library (login may be required), and got a lot of results. It wasn’t until result 58 (Templated Search over Relational Databases by Zouzias, Anastasios; Vlachos, Michail; Hristidis, Vagelis) that anything was even vaguely related to computer science. The preceding 57 were mostly from biology and/or chemistry, with a perhaps a smattering of maths. At result 79 there’s something which looks at least worth reading the abstract (Patent Issued for Serializing a Templated Markup Language Representation of Test Artifacts, Computer Weekly News, 12/2013) but the link from the search results takes me to a set of entries in a publication database with mo mention of templates. Result 98 (IXIR: A statistical information distillation system by Levit, Michael; Hakkani-Tür, Dilek; Tur, Gokhan, …) seems possibly interesting but turns out to be a system which uses templated queries, a different technology altogether.

It’s not until result 119 that I get anything even peripherally related to my topic area. In this case a book (Professional ASP.NET MVC 4, by Galloway, Jon; Allen, K. Scott; Haack, Phil, …) which describes a web software development technology, some parts of which have aspects of templating. Crucially, though, this book is a commercial publication, not subject to academic peer review, and so not the best choice for a key reference. Result 132 has an updated version of the same book. Result 166 has another (older, this time) version of the same book. Result 191 yet another version. By result 198 even the microbiology is running thin, and we get a “business” book (Lead Generation For Dummies by Rothman, Dayna). With nothing useful in the first two hundred results it’s clear that this is not a productive search. Sure I could keep going, but tenacity on its own is not enough.

There are several directions to go from here, including:

  • Use the advanced selection features of the database search and restrict results to scholarly and peer-reviewed computer science publications
  • Re-think the search terms to try and find more precise and less ambiguous terminology
  • Give up on searching and instead concentrate on following the bibliographic citation tree of articles and researchers

The first option intiially seems reasonable, and gives a more appropriate set of results, but still considerably more misses than hits. The first result is IXIR: A statistical information distillation system by Levit, Michael; Hakkani-Tür, Dilek; Tur, Gokhan, … again, the second is about algorithmic program generation, and the third looks like it might even be useful (Statically safe program generation with SafeGen by Huang, Shan Shan; Zook, David; Smaragdakis, Yannis) As it turns out it’s certainly not a key text, but at least the approach described does make some small use of textual templating to generate code fragments. Result 6 is effectively the same work, but from a different source. Result 23 (Extracting Web Data Using Instance-Based Learning by Zhai, Yanhong; Liu, Bing) is the first hint of something I will see much more of later, the inverse of my topic. It seems that trying to analyse and remove boilerplate text, in this case from web pages, leaving just the “important” parts, is (or at least was in 2007) an important issue in data extraction and indexing. This paper is unconcerned with how the templated pages were generated, however. Nothing else of interest appears in the remaining nine results.

At this point I have become convinced that it is my search terms which hold the key. And I also have a sneaking suspicion that I will need several different searches to uncover papers on the use of templates in different contexts. The only even marginally useful documents I have found so far relate to web pages, so I decide to focus on this area in an attempt to discover some more effective terminology.

A search for web template language begins to turn up useful works. Bracketed by a pair of “inverse” papers, result two (TAL—Template Authoring Language by Soares Neto, Carlos de Salles; Soares, Luiz Fernando Gomes; de Souza, Clarisse Sieckenius) at last introduces a paper which is actually about templating. Best of all, it’s not a template language I have ever heard of. Despite a publication date of 2012 this seems an oddly old-fashioned approach, but it meets my inclusion criteria so it’s one for the big bibliography. Result 8 (Framework testing of web applications using TTCN-3 by Stepien, Bernard; Peyton, Liam; Xiong, Pulei) initially looks interesting, but turns out to be another form of inverse: template-like pasterns used to match variable data using testing. I do take note of the possibly useful keyword “framework” for later searches, though. Result 17 (Advanced authoring of paper-digital systems: Introducing templates and variable content elements for interactive paper publishing by Signer, Beat; Norrie, Moira C; Weibel, Nadir; …) is interesting as it contains (among other things) an approach to using templated text in a broader range of documents. So that one is in, too. Looks like “authoring” could be a useful keyword, as well. Result 29 (A lightweight framework for authoring XML multimedia content on the web by Vanoirbeek, Christine; Quint, Vincent; Sire, Stéphane; …) reinforces the importance of these keywords, as it is of interest and contains both “framework” and “authoring”. Result 37 (EDITEC – a graphical editor for hypermedia composite templates by Damasceno, Jean Ribeiro; dos Santos, Joel André Ferreira; Muchaluat-Saade, Débora Christina) is somewhat similar to the TAL paper, above. but this has the benefit of a few possibly more interesting looking references in its bibliography.

So far, then, I have one document with some potentially useful references and a slowly growing list of useful keywords: “template“, “authoring“, “framework” and “web“. Looking at that list makes me wonder what other keywords might be useful. If “framework” is there, then maybe “language” and “system” could also be useful, for example. And perhaps I should also be thinking of words which might be common in the body of articles as well as the titles, such as “placeholder“, “boilerplate“, “static“, “dynamic” or “replace“. Words such as “text” or “document” are too vague and conflict with the meta-domain of publishing, thus disproportionately likely to appear in papers and articles of any kind. Perhaps a better approach might be to focus on a few specific document format types in which templating is common, such as “HTML” or “XML“, with the hope of finding references to the more generic texts from specific documents.

With a growing search term list it’s important to be methodical and make sure that no potentially fertile combination is accidentally missed, in the excitement of following leads. It’s tempting to just click away on anything which seems interesting, but that rapidly loses context. Another, and almost as tempting approach is to use the power of a web browser to open each lead in a separate tab, only closing the tabs when the associated document has either been recorded or rejected. I tried this and it seemed very effective to start with but it soon became unworkable, for a variety of reasons.

The first group of problems with this approach is fairly simple: unlike hyperlinks in a web page, references and document index results usually require several steps to resolve to a readable text. Index entries often link to an abstract or a summary page, the actual articles may be behind a paywall, and even where I have access via the university or via a subscription to a professional body it’s still extra steps. Some document stores don’t play well with the web, and try to open documents in iframes, pop-ups, or specific named tabs, some stores present an unusable mime type for the document, forcing the browser to download rather than display the text, and so on. Implicit in this is that some documents will result in several tabs. If there were no limit, this would be only a minor annoyance, but even the best browsers have a hidden problem. As the number of tabs grows, the room on the label of each tab decreases, in turn both hiding the document title (leading to a lot more tab-swapping to find anything) and (arguably worse) making it increasingly probable that a close button will be hit by mistake, throwing away what might be a vital resource. All of this means that managing tabs and making sure that everything important is visible, and nothing vanishes before processing becomes increasingly tricky, even without the next problem.

The second type of problem is one of time and place. Attempting to manage a document search using multiple browser tabs feels like pearl diving. Take the biggest breath you can and head into the depths, keep looking while your lungs burn and hope that you can find something valuable before you have to come up for air and start all over again. In this metaphor “coming up for air” is any kind of break in the flow of searching which loses context. This interruption might be as commonplace as a phone call or a toilet break, but could also be something more serious such as a browser or computer shutdown. Who using a laptop has not felt the mounting stress of a low-battery warning? As a part-time student, this is my biggest problem. I rarely have the luxury of a long block of time to devote to a single activity, so I have to split tasks into smaller, more achievable, chunks. If you look carefully you may even be able to spot the several occasions where I left and came back to working on this article. I also work on a variety of machines. I have a laptop for travel, desktops at home and in my office and a growing stable of available machines at universities, co-working spaces and so on.

Any technique I use for more than the most superficial of queries has to be persistent across distractions, crashes and accidental clicks as well as easily transportable between different computers, which rules out browser tabs and anything transient such as an open text editor. For now I am using a combination of tools. As a “scratchpad” I use a Google Document and I copy and paste anything which might bear further investigation: a URL for a search page or a document link, text of a citation, possible search terms and so on. Once I have found a document which looks worth more detailled study, I add it to Mendeley which maintains a synchronised library between my various machines, with a web interface for when I am using a shared device. This works as far as capturing documents for study, but I have yet to master Mendeley’s tools for tagging, grouping, reviewing, annotating and citing of documents. Mendeley also offers a way to “follow” other writers and researchers, to be notified of what they find or produce. In time this could prove to be very valuable, but I’m not using it much yet.

As of this post, I have used the above techniques to search for several combinations of the above keywords, and eventually found some useful papers. For example a search for “Web framework” turned up Server-centric Web frameworks: an overview by Vosloo, Iwan; Kourie, Derrick G, ACM Computing Surveys, 06/2008, Volume 40, Issue 2 at position 23. This is the most useful article so far; in among a general survey of web frameworks, one section attempts to map a taxonomy of templated approaches. I can see immediate ways in which this will be useful in my own research. The terms boilerplate template results in a lot of legal and sociological texts, but at position 15 gives Is the Browser the Side for Templating? by Garcia-Izquierdo, F. J; Izquierdo, R, IEEE Internet Computing, 2012, Volume 16, Issue 1. Searching for “template processor” brings up a lot of articles about templates for processor design, but at position 7 we find XRound: A reversible template language and its application in model-based security analysis by Chivers, Howard; Paige, Richard F, Information and Software Technology, 2009, Volume 51, Issue 5 XTemplate is at position 10, TAL is at position 14, and nothing else of note until position 74 with the peripherally useful Advanced authoring of paper-digital systems: Introducing templates and variable content elements for interactive paper publishing by Signer, Beat; Norrie, Moira C; Weibel, Nadir; More…

There’s obviously a lot more searching to do, as well as following up on bibliographical entries, looking for other work by the authors of useful texts, and working through the indexes of likely-looking journals, but the conclusion is that even with pretty specific technical terminology, it takes work and patience to track down appropriate academic resources.

Old stuff still has value

I was just surprised, and somewhat delighted, by a LinkedIn connection request. Not that receiving connection requests from LinkedIn is in itself an unusual thing – I get several a week from various sources. This one was unusual because the sender made reference to liking a Java Ranch forum post I made over sixteen years ago

I re-read the post, and could see that several people had bumped into it, and found it useful, over the intervening years, which was certainly very pleasing. It brought to mind the thought that, even in the fast-paced world of computer programming, there is value to history and theory. Not everything which is old is useless, and some concepts exist in the abstract, separate from particular implementations, yet useful for understanding those implementations.

Introduction to my PhD Research

It’s the nature of research, and particularly of doctoral research, that approaches change and details become clearer only over time. With that in mind this is an introduction to the topic area of my research as I see it now, right at the start of the process, borrowing heavily from the proposal document I submitted to the university as part of my application.

For now, my working title is “A comparative analysis of template languages for text generation“. This is a surprisingly broad area, and I’m sure it will be narrowed down to a more precise research question as things progress.

Almost every use for computers has a need to produce textual output. Not just fixed, hand-crafted text but also such text combined with variable content. As an example, most business software comes with some sort of “mail merge” facility for the “small print” on an invoice; a form letter with a bit more individuality than “Dear customer”, and junk email offering supposedly unbeatable personalised offers.

The most common way of producing these kinds of documents uses a templating technique. A master document containing blocks of fixed text and special symbolic tokens (sometimes known as “placeholders”) is processed through a software system which combines the supplied text with selected data records to produce a set of similar, but individualised, documents. In each resulting document the placeholders have been replaced by appropriate values from the data.

Templating is not limited to personalised mail, however. Large swathes of the visible pages of the web are produced in this way. This very blog, for example! The content of the post you are reading exists in a database, along with the list of posts for the archive section and so on. The structure of the page, where to put the headers and widgets, and any other “boilerplate” are in a single template used for every blog post. Placeholders in the page templates show where the specific blog text, title and so on should be inserted.

Templated text generation is also common in less visible internet traffic. Emails and other textual messages, logging, diagnostic output, code generation and many data interchange formats and protocols, all benefit from this powerful technique. The separation of fixed common format from the variable parts of the data enables the two to be produced independently, often at different times, by different teams or departments.

All these uses of templating have some common character but beyond this superficial similarity lies potentially thousands of differing implementations. For example, there are:

  • templates used on servers and in a web browser
  • some which use a single master document, while others can select from alternatives or combine many document fragments)
  • some which merge single values, flat records, structured data or fetch values from remote systems as required
  • some which are tied to a specific document format, language or character set, and some which can produce arbitrary data output
  • some which concentrate on replacing single tokens, while others contain programming constructs such as loops and decisions
  • some template engines which stand alone, while others can only be used within a larger framework
  • some contain all their own processing tools, and some which hand off placeholder values to a separate programming language
  • template languages with a formal syntax, and those comprised of ad-hoc or extensible components

As well as these operational characteristics there is also a wide range of implementation quirks. There are template systems for just about every programming language, and some languages have many to choose from. Even within such groups there are implementations with wildly differing performance and resource usage, as well as (for example) the treatment of line breaks and other “white space”.

In my many years in the software industry, I have used a lot of template systems, and even written a few, but it has become increasingly apparent that this field is fragmented and divisive. Developers of every new programming language, server, tool and framework feel compelled to enter the fray with another slightly different template system, yet the same naive approaches and uninformed choices continue to be created and promoted as if they are something new.

Even choosing a template solution for a project has becomes a significant issue. Comprehensive comparative information about the many differing implementations can be hard to come by. The Wikipedia page on the subject lists over one hundred template engines but is far from complete. Software documentation for such tools, where present at all, tends to focus on the use of a single implementation, usually ignoring or sidelining missing capabilities, and avoiding direct comparison with alternatives from other providers.

This is all compounded by the lack of a theoretical foundation on which to base comparison and inform discussion. My aim is to produce such a foundation.

Lecture Review: “The study of American literature” by Dr Owen Robinson

University of Suffolk holds an annual series of free evening lectures. The first since I started at the university was on the topic of American Literature – not at all germane to my particular research, but potentially interesting nonetheless. Much more appropriate, however, for Kat, who is aiming to take Creative Writing (which includes, of course, the study of Literature) at University next year. So the two of us went along to see what it would be like.

There was an online booking procedure, but it turned out that this was mainly for gathering numbers and issuing temporary swipe cards; the chosen auditorium was considerably less than half full. Being a student, I already had a card which would grant me access, but the temporary one was potentially useful for Kat. On climbing the stairs to the second floor we were presented with a mezzanine foyer slowly filling with retired people. By the start of the presentation, I estimated that at least 95% of the audience were over 65; as far as I could tell, Kat was the second youngest in the room. Second only because someone had brought a babe in arms!

On entering the lecture hall we chose seats somewhere near the centre, and entered into discussion of the images the lecturer had placed on the screen. Two similar flagpoles, each with a star-spangled banner waving in the breeze – one clearly visible against a blue sky, the other almost indistinguishable behind a tangle of branches. These images were later referred to as indicating the way that, despite some claims, there is no single clear “American Literature” but instead a complex tangle of cultures all of which contribute some aspect of the American literary experience.

The lecturer Dr Owen Robinson entered the hall dressed in, if not a costume then certainly inspired by, 19th century attire. John Bull hat, frock coat and weskit, pocket watch and all. Despite this affectation he seemed a young man very grounded in modern political views as well as his subject of historical American writing. The talk began with a disclaimer that he would find himself unable to refrain from mentioning politics, in particular the recent election Donald Trump as president, and his actions since taking office. This was a theme which cropped up several times during the lecture, but usually in a way which reflected on the core material.

The lecture itself fell into two parts. The first part dealt with the author Frederick Douglass, and in particular his work Narrative of the Life of Frederick Douglass, an American Slave, written by Himself. I have not studied much American history or literature, and so was not aware of this author, nor of the genre of Slave Narratives which it exemplifies. Dr Robinson took us on a tour of the significance of Douglass and his writing, and used it to highlight the lack of President Trump’s knowledge of his own country’s history. The second part ranged a little wider, illustrating cartographically the development of regions of the United States over time, culminating in a study of some writings by Ralph Waldo Emerson and Walt Whitman and their influence on American notions of culture and identity. As a final point, Dr Robinson showed a video clip of Jimi Hendrix playing his own guitar version of The Star Spangled Banner. Perhaps a self-indulgent ploy by a guitar-loving lecturer, however, he did make the point that even though the piece was an instrumental, the words and the usage of the National Anthem would have been known to virtually all of his audience. Like so much American media, this performance existed within a shared cultural consciousness, and its political and musical aspects could not be separated from the feeling of being American so strongly evoked by the song.

Overall the themes of the talk wove the notion of writing and speaking as an act of creation, not just of the words themselves, but of people and cultures. Douglass wrote himself into fame and authority, despite starting from the insignificance of a slave. Emerson and Whitman were instrumental in defining the written culture of post-colonial America and in particular its independent and re-inventive nature. Hendrix took an overwhelming love for country and used it to underscore political commentary.

Following the main lecture there was a brief question and comment session, in which I raised the point that, despite Dr Robinson’s disdain, there is a sense in which President Trump exemplifies the same American characteristics pointed out in the historic writings – he disregards tradition and history, and uses all his speaking and writing to create the reality he desires. Trump truly is the kind of leader which could only arise in America. When the questions and comments were over we all adjourned to the foyer for drinks and canapés before making our way home.

PhD finally under way

On Wednesday I made another significant step in my academic progress. I took the afternoon off work to go over to the University of Suffolk and sort a bunch of annoying tasks:

  • Collect a NUS Extra card, for student discounts
  • Chase up and eventually obtain a university ID card for access to private areas
  • Discuss my application process with someone from the university postgraduate office
  • And my first official supervisor meeting!

The most important part of the day was definitely the supervisor meeting. I had met and spoken with my supervisor several times before, but this was the first official meeting as part of the PhD program. I had come with a progress report and a slew of questions; he had come with the formal requirements of recording the meeting. Most of my enquiries received sensible responses, and those that did not were taken as actions. At my request (and as suggested in Phillips and Pugh (2015)) we also agreed a task for me to complete for the next meeting. I am to gather a data set of my reading to date including: search terms, categorisation, proportions of usability, quantitative numbers, trends, categories, areas of interest, a timeline and so on. I’m not sure yet exactly what form this will take, but I get the strong feeling that the point of the exercise is the exercise. Learning to note, count and categorise my search process as well as the things I find is obviously a useful research skill, and one that’s easy to overlook.

Now it really feels as if the huge and weighty PhD ship is starting to move.

A New Challenge

This blog has been pretty quiet for a long while. I just don’t seem to have found the time or motivation to post here, even though I still have plenty of rants about software and software development. With that in mind I thought I’d have a go at breathing a bit more life into it, and the perfect opportunity has now arisen.

It has been a long held aim of mine to complete a PhD. Since my first degree, I have slowly worked my way though a slew of ‘A’ levels, A Postgraduate Diploma (PgDip), a Postgraduate Certificate in Education (PGCE) and eventually a Master of Science (MSc), all while continuing in full-time work. With the creation of a new university in my home town of Ipswich, the time seemed ripe to apply. I began asking around in early 2016, and approached the university in a more concrete way in the summer. By the time I got to meet with a potential supervisor it was the 20th of October, but at least it seemed progress was being made. Then I received an email explaining that if I wished to apply for a January 2017 start, I had to get a completed application and proposal in by the 31st of the same month. 11 days to fill in several application forms and put together a compelling, fully researched and referenced proposal. It was a huge rush, but with the assistance of the postgraduate administrator at the time, and a lot of help from my potential supervisor in reviewing my proposal, I did manage to get everything submitted on time. A short while later I attended an interview to discuss my proposal, which seemed to go well. Then began a period of waiting.

In one email or another I had been informed that there would likely be a decision ‘early January’, so I waited through November and December, then sent an email timidly asking if there was any progress a week or so into January. A few tidbits of information came my way, but still no decision. I was asked to keep 25th January free for an induction day at UOS, and when we got to the day before I asked what I should do, and was recommended to come along, even though things were still not entirely sorted out. I took the day off work and went along, to find that, unlike the other applicants, I could not actually be enrolled or be issued an id card. For the next few days I continued to chase the postgraduate admissions people, until, eventually I did receive a letter from the university and an email with a student number and began the actual enrolment process on the first day of February 2017. As of this writing I have a university id and a password which works with some (but not all) university systems but still no id card, so I can’t actually get in to any of the buildings, but it’s getting closer…

All of which leads to the point of this story. Over the next five to six years I plan to fill my time with researching, reading, writing, speaking and generally living my PhD topic, and with any luck I will blog my thoughts and progress here. Not that this blog will be completely taken over by study, though. I will still write here about other fields of software development, and it seems entirely likely to me that if I’m in the habit of writing, I’m more likely to write here generally.

Everything I have heard and read reckons a PhD is a significant challenge, requiring continued resilience, enthusiasm and determination, so I can’t know what the future holds. But it certainly feels like a grand adventure!

Functional Testing, BDD and domain-specific languages

I love Test Driven Development (TDD). If you look back through the posts on this blog that soon becomes apparent. I’m pretty comfortable with using TDD techniques at all levels of a solution, from the tiniest code snippet to multiply-redundant collaborating systems. Of course, the difficulty of actually coding the tests in a test-driven design can vary widely based on the context and complexity of the interaction being tested, and everyone has a threshold at which they decide that the effort is not worth it.

The big difference between testing small code snippets and large systems is the combinatorial growth of test cases. To fully test at a higher (or perhaps more appropriately, outer) layer would require an effectively infinite number of tests. By a happy co-incidence, though, if you have good test coverage at lower/inner layers of the code, you can rely on them to do their job, so it’s only needed to test the additional behaviour at each layer. Even with this simplifying assumption the problem does not fully go away. Very often the nature of outer layers implies a broader range of inputs and outputs, and a greater dependence on internal state from previous actions. It can still seem to need a ridiculous amount of tests to properly cover a complex application. And worst of all, these tests are often boring. Slogging through acres of largely similar sequences, differing only in small details, can be a real turn off for developers, so once again, it can feel that the effort is not worth it.

If fully covering the outer layers of a system, with tests for every combination of data, every sequence of actions, and every broken configuration is prohibitively expensive, time-consuming and downright boring, then it makes sense to be smart about it, and prioritise those tests which have the highest value. Value in this sense is a loose term encompassing cost of failures, importance of features, and scale of use. A scenario with a high business value if successful, a high cost of failure, which is used often by large numbers of people would be an obvious choice for an outer layer test. Something of little value which nobody cares about and which is hardly ever used would be much further down the list.

And this leads neatly to the concept of automated “functional testing”. Functional tests being outer layer tests which exercise these important, valuable interactions with the system as a whole, Arguably there is a qualitative difference between unit tests of internal components and functional tests of outer layers. Internal components have an internal function, and relate mostly with other internal components and their interactions are designed by developers for code purposes. This makes it relatively easy for developers to decide what to test, so specifying tests in similar language to the implementation code is a straightforward and effective way to get the tests written. Outer layers and whole applications have an external function, and business drivers for what they do and how they do it. This can be less amenable to describing tests in the language used by programmers. Add to this the potentially tedious nature of these external, functional, tests and its easy to see why they sometimes get overlooked, despite their potentially high business value.

Many attempts have been made over the years to come up with ways to get users and business experts involved in writing and maintaining external functional test cases. My friend, some-time colleague and agile expert Steve Cresswell has recently blogged about a comparison of Behaviour-Driven Design (BDD) tools. It’s an interesting article, but I can’t help thinking that there is another important dimension to these tools which also needs to be unpicked.

Along this dimension, testing tools range from “raw program code” (with no assistance from a framework); through “library-style” frameworks which just add some extra features to a programming language using its existing extension mechanisms; “internal DSL” (Domain Specific Language) frameworks which use language meta-programming features to re-work the language into something more suitable for expressing business test cases; “external languages” which are parsed (and typically interpreted, but compilation is also an option) by some specialist software; through to “non-textual” tools where the test specification is stored and managed in another way, for example by recording activity, or entering details in a spreadsheet.

In my experience, most “TDD” tools (JUnit and the like) sit comfortably in the “library-style” group. test case specification is done in a general-purpose programming language, with additions to help with common test activities such as assertions and collecting test statistics. Likewise, most “BDD” tools (Cucumber and the like) are largely in the “external language” group. This is slightly complicated by the need to drop “down” to a general-purpose language for the details of particular actions and assertions.

Aside from the “raw program code” option which, by definition, has no frameworks, the great bulk of test tools occupy the “library style” and “external language” groups, with a small but significant number of tools in the “non-textual” group. I find it somewhat surprising how few there seem to be in the “internal DSL” category, especially given how popular internal DSL approaches are for other domains such as web applications. There are some, of course, (coulda, for example) which claim to be internal DSLs for BDD-style testing, but there is still a more subtle issue with these and with many external languages.

The biggest problem I have with most BDD test frameworks is related to the concept of “comments” in programming languages. Once upon a time, when I was new to programming, it was widely assumed that adding comments to code was very important. I recall several courses which stressed this so much that a significant point of the mark was dependent on comments. Student (and by implication, junior developer) code would sometimes contain more comments than executable code, just to be on the safe side. Over the years it seems that the prevailing wisdom has changed. Sure, there are still books and courses which emphasise comments, but many experienced developers try to avoid the need for comments wherever possible, using techniques such as extracting blocks of code to named methods and grouping free-floating variables into semantically meaningful data structures, as well as brute-force approaches such as just deleting comments.

The move away from comments has developed gradually, as more and more programmers have found themselves working on code produced by the above mentioned comment-happy processes. The more you do this, the more you realise that the natural churn and change of code has an unpleasant effect on comments. By their nature comments are not part of the executable code (I am specifically excluding “parsed comments” which function like hints to a compiler or interpreter here). This in turn means that comments can not be tested, and thus cannot automatically be protected against errors and regressions. Add to this the common pressure to get the code working as quickly and efficiently as possible, you can see that comments often go unchanged even when the code being commented is radically different. This effect then snowballs – the more that comments get out of step with the code, the less value they provide, and so the less effort is made to update them. Soon enough most (or all!) comments are at best a useless waste of space, and at worst dangerously misleading.

What does this have to do with BDD languages? If you look in detail at examples of statements in many BDD languages, they have a general form something like keyword "some literal string or regular expression" Typical keywords are things like “Given”, “When”, “Then”, “And”, and so on. For example: Given a payment server with support for Visa and Mastercard. This seems lovely and friendly, phased in business terms. But let’s dig into how this is commonly implemented. Very likely, somewhere in the code will be a “step” definition.

An example from spinach, which is written in Ruby, might be:

  step 'a payment server with support for Visa and Mastercard' do
    @server = deploy_server(:payment, 8099)

This also looks lovely, linking the business terminology to concrete program actions in a nice simple manner. However, this apparent simplicity hides the fact that the textual step name has no actual relationship to the concrete code. Sure, the text is used in at least two places, so it seems as if the system is on the case to prevent typos and accidental mis-edits, but it still says nothing about what the step code actually does. As an extreme example, suppose we changed the step code to be:

  step 'a payment server with support for Visa and Mastercard' do
    Dir.foreach('/') {|f| File.delete(f) if f != '.' && f != '..'}

The test specifications in business language would still be sensible, but running the tests would potentially delete a whole heap of files!

For a less extreme example, imagine that the system has grown a broad BDD test suite, with many cases which use “Given a payment server with support for Visa and Mastercard”. Now we need to add support for PayPal. There are several options including:

  • copy the existing step to a new one with a different name and an extra line to install a PayPal module (including going through all the test code to decide which tests should use the old step and which should use the new one
  • add the extra line to the existing step and modify the name to include PayPal then change all the references to the new name
  • or just add the extra module and leave the step name the same.

The last option happens more often than you might think. Just as with the comments, the BDD step name is not part of the test code, and is not itself tested, so there is nothing in the system to keep it in step with the implementation. And the further this goes, the less the business language of the test specifications can be trusted.

I have worked on a few projects which used these BDD approaches, and all of them fell foul of this problem to some degree. Once this “rot” takes hold, it seems almost inevitable that the BDD specifications either become just another set of “unit” tests, only understood by developers who can dig into the steps to work out what is really happening, or they are progressively abandoned. It seems unlikely that many projects would be willing to take on the costs of going through a large test suite and checking a bunch arbitrary text that’s not part of the deliverable, just to see if it makes sense.

I wonder if the main reason that this is not seen to be more of a problem is that so few projects have progressed with BDD through several iterations of the system, and are still using the approach.

So, is there any way out of this trap? For some cases, using regular expressions for step name matching can help. This can provide a form of parameter-passing to steps, increasing reuse for multiple purposes and reducing churn in the test specifications. This does not solve the overall problem, though, as it still has chunks of un-parsed literal text. For that we would need a more sweeping change.

Which brings me back to internal and external DSLs. To my mind, the only way to address this issue in the longer term is to define the test specifications in a much more flexible language, one which suits both the domain being tested and the domain of testing itself. My aim would be to avoid the need for those comment-like chunks of un-parsed literal text, and align the programming language with the business language well enough that expressions in the business language make sense as programming constructs. If done well, this should allow test specifications to be changed at a business level and still be a valid and correct expression of the desired actions and assertions.

Although some modern programming languages are relatively adept in metaprogramming and internal DSLs (Ruby is well-known for this), mostly they don’t go far enough. Arguably a better choice would be to base the DSL on one of the languages which have very little by way of their own syntax, and thus are more flexible in adapting to a domain. Languages such as LISP and FORTH have hardly any syntax, and have both been used for describing complex domains. It used to be said of FORTH that the task of programming in forth was mainly one of writing a language in which the solution is trivial to express.

I’m afraid I don’t have a final answer to this, though. I have no tool, language or framework to sell, just a hope that somebody, somewhere, is working on this kind of next-generation BDD approach.

For more information about low-syntax languages, you might want to read some of my articles on Raspberry Alpha Omega, such as What is a High Level Language and More Language Thoughts and Delimter-Free Languages.

Also, for Paul Marrington’s take on internal vs external DSLs for testing, see BDD – Use a DSL or natural language?.

This blog may be a bit quiet, I’m busy elsewhere

Sure, quiet is relative. Over the years I have gone through enthusiastic patches and months with nothing but the occasional scrap of a link. At the moment, though, the quietness here has a reason: I’m too busy having fun messing with software and hardware on my Raspberry Pi.

If you don’t know already, Raspberry Pi is a credit-card sized computer with an ARM core, 256MB of RAM (recent ones have 512MB), HDMI 1080p video, USB, SD Card and ethernet on board. It also has a bunch of general purpose I/O pins and only takes a couple of watts of power so it can even run from a USB lead . There’s a build of Debian Linux which runs on it so you can do stuff in Ruby, Python or whatever, and a growing collection of interesting hardware to plug in to it. People are using it for tiny network devices (media players running XBMC, for example), teaching children to program, and a whole host of embedded and robotics projects.

If you think that still sounds meh, then consider the price. You can buy a 512MB model right now for less than £40!.

I’m having great fun trying things out with mine, and blogging about it at

Charles Moore on Portability


Don’t try for platform portability. Most platform differences concern hardware interfaces. These are intrinsically different. Any attempt to make them appear the same achieves the lowest common denominator. That is, ignores the features that made the hardware attractive in the first place.

Achieve portability by factoring out code that is identical. Accept that different systems will be different.

via ‘

Tracking configuration changes in Jenkins

Continuous Integration is a pretty common concept these days. The idea of a “robot buddy” which builds and runs a bunch of tests across a whole codebase every time a change is checked in to the source code repository seems a generally good idea. There are a range of possibilities how to achieve this, and one of the most popular is Jenkins, the open source version of Hudson, originally a community project but now owned by Oracle.

Although Jenkins is a useful tool, it can be a bit fiddly to set up, with a lot of web-form-filling to configure it for your projects once the basic web app is installed. Most projects I have worked on which use Jenkins have started by working through this setup using trial and error, then stuck with something which seemed to work. In an optimistic world this is OK, but it fails whenever there is a problem with the machine running the CI service, if another one is required, or if someone makes a change which breaks the configuration and the team needs to quickly roll back to a working config. In short it’s like software development used to be before we all used version control and unit tests!

With all this in mind, I was interested to read an article about a Jenkins plugin for version-controlling the configuration. This could be a step in the right direction.

Tracking configuration changes in Jenkins – cburgmer's posterous.

If you use Jenkins this seems a very good thing to try out. I still feel, though, that all this configuration does not really belong in a CI tool. It’s really just some instructions about how how to build, deploy, and test the product, and to my mind that’s the kind of stuff which belongs with the code itself.

Imagine how simple things would be if there was no need for a whole CI “server”, but we could just use a simple git hook which called something managed, edited, tested and checked in with the code itself.

Kent Beck on incremental degradation (“defactoring”) as a design tool

Thanks to @AdamWhittingham for pointing out a great post from Kent Beck in which he suggests an “if you can’t make it better, make it worse” approach to incremental development. This is a habit that has been a part of my development process for a long while, and I have needed to explain it to mildly puzzled colleagues several times.

An analogy I have often used is that of Origami, the Japanese art of paper folding. The end results can be intricate and beautiful, but the individual steps often seem to go back on themselves, folding and unfolding, pulling and pushing the same section until it ends up in the precisely correct configuration. Just as folding and unfolding a piece of paper leaves a crease which affects how the paper folds in future, factoring code one way then defactoring it to another leaves memory and improved understanding in the team which affects how the code might be changed in the future.

Don’t be afraid to inline and unroll, what you learn may surprise you.

Baruco 2012: Micro-Service Architecture, by Fred George

A fascinating presentation from Barcelona Ruby Conference. Fred George talks through the history and examples of his thinking about system architectures composed of micro services.

I found this particularly interesting as it has so many resonances with systems I have designed and worked on, even addressing some of the tricky temporal issues which Fred has avoided.

Thanks to Steve Cresswell for pointing this one out to me.

Assembla on Premature Integration or: How we learned to stop worrying and ship software every day

An excellent article from Michael Chletsos and Titas Norkunas at Assembla, which reminded me how important it is to keep anything which might fail or need rework off the master branch.

It’s a truism about software development that you never know where the bugs will be until you find them. This can be a real problem if you find bugs in an integrated delivery, as it prevents the whole bunch from shipping. Assembla have some interesting stats about how they have been able to release code much more frequently by doing as much testing and development as possible on side-branches.

Read more at: Avoiding Premature Integration or: How we learned to stop worrying and ship software every day.

Do You Really Want to be Doing this When You’re 50?

I just read an article (Do You Really Want to be Doing this When Youre 50?) from James Hague, who describes himself as a “Recovering Programmer“.

I understand his experience, and his reasons for deciding that it’s not the job for him. I even like that he has blogged about it.

What I really don’t like, though, is the sweeping generalisation that this is a universally-applicable, age-related issue. Software development is already a field riddled with ageism from employers and management (see Silicon Valley’s Dark Secret: It’s All About Age from TechCrunch), and the last thing we need is an assumption that one person’s dissatisfaction with a particular job means that, in James Hague’s closing words, “large scale, high stress coding? I may have to admit that’s a young man’s game”

A programmer in his or her 40s or 50s who has kept up with the evolution of the field can be one of the most awesome and effective developers you’ll ever meet.

Experimenting with VMware CloudFoundry

Yesterday evening I went along to the Ipswich Ruby User Group, where Dan Higham gave an enthusiastic presentation about VMware CloudFoundry. The product looked interesting enough (and appropriate enough to my current project) that I decided to spend a few hours evaluating it. On the whole I’m impressed.

After poking around the web site a bit I decided to download the “micro cloud foundry” version, which does not need dedicated hardware, as it runs as a virtual machine on a development box. Once I had passed the first hurdle of requiring registration before even starting a download, then waited for about 40 minutes for the VM image to download, I had a go at running it up.

Now, I must say that I went a bit “off road” at this point. Naturally enough the web site recommends only VMWare virtual containers for the task, but the last time I used VMWare player (admittedly a few years ago) I found it clumsy and intrusive, and did not want to clutter my relatively clean development box. I already have virtualbox installed, and use it every day, so I thought I’d see how well the image runs in this container. Starting the VM was just a matter of telling virtualbox about the disk image, allocating some memory (I have a fair amount on this machine, so I gave it 4G to start with) and kicking it off.

Initially, everything seemed great. The VM started OK, and presented a series of menus pretty much as described in the “getting started guide”. I had been warned that setting it up might take a while, so I was not worried that it twiddled around for 15 minutes or so before telling me that it was ready. The next step according to the guide was to go to the system where the application source code is developed and enter vmc target (the micro cloud foundry VM has no command line, it’s all remotely administered) to connect the management client to the VM. This was a bit of a gotcha, as the “vmc” command needs a separate installation, found elsewhere on the site. Essentially “vmc” is a ruby tool, installed as a gem. In this case I already had ruby installed (it’s a ruby project I’m working on), so it was just a matter of sudo gem install vmc. Once I had installed vmc, I tried to use it to set the target as suggested, but the request just got rejected a somewhat confusing error message. On the surface it did not appear to be a network issue – I could happily “ping” the pseudo-domain, but vmc would not connect. After some looking around, both on the web, and digging in the “advanced” menus of the micro cloud foundry VM, I eventually realized that the error message in question was not actually coming from the micro cloud foundry at all but from a server running on my development system!

To make sense of why, I need to describe a bit about my development set up. The basic hardware is a generic Dell dektop with its supplied Windows 7 64-bit OS. I don’t particularly like using Windows for development (and did not want to wipe the machine, because I also need to use Windows-only software for tasks such as video and audio production), so I do all my development on one of a selection of virtual Ubuntu machines running on virtualbox. This is great in so many ways. I have VM images optimised for different work profiles, and run them from a SSD for speed. Best of all, I can save the virtual machine state (all my running servers, open windows and whatnot) when I stop work, and even eject the SSD and take it to another machine to carry on if needs be.

So, the problem I was seeing was due an interaction between the “clever” way that micro cloud foundry sets up a global dynamic dns for the VM, and the default virtualbox network settings. To cut a long story short, both my development VM and the micro cloud foundry VM were running in the same virtualbox, and both using the default “NAT” setting for the network adapter. Somewhat oddly, virtualbox gives all its VM images the same IP-address, and all the incoming packets were going to the development VM. More poking around the web, and I found that a solution is to set up two network adapters in virtualbox for the micro cloud foundry. Set the first one to “bridge” mode, so it gets a sensible IP address and can receive its own incoming packets, and set the second one to NAT, so it can make requests out to the internet. I left the development VM with just a “NAT” connector, and it seems happy to connect to both the web and to the micro cloud foundry VM via the dynamic dns lookup.

Of course, it was not all plain sailing from there, though. The first issue was that I kepy getting VCAP ROUTER: 404 - DESTINATION NOT FOUND as a reponse. A message that was obviously coming from somewhere in cloud foundry, but gave no obvious hint what was wrong. After a lot of trying stuff and searching VMware support, FAQ and Stack Overflow, I came to the conclusion that this is largely an intermittent problem. After a while things just seemed to work better. My guess is that when the micro cloud foundry VM first starts it tries to load dependencies and apply updates in the background. This is probably a quick process inside VMware’s own network, but out here at the end of a wet bit of string, things take a while to download. Eventually, though, things settled down and I was able to deploy some of the simple examples. Hooray! I have subsequently found that the micro cloud foundry VM needs a few tens of minutes to settle down in a similar way every time it is started from cold. Good job I can pause the virtual machine in virtualbox.

The process for deploying (and re-deploying) applications which use supported languages and frameworks is largely smooth and pleasant. It does not use version control (like Heroku, for example) but a specific set of tools which deploy from a checked-out workspace. If you want to deploy direct from VCS, it’s easy enough to attach a a little deploy script to a hook, though.

Once I got past paying with the the examples, I tried to deploy one of my own apps. It’s written in Ruby, uses Sinatra as a web framework and Bundler for dependency management, so it should be supported. But it does not work at all on cloud foundry. It works fine when I run “rackup” on my development box, and it works fine when I deploy it to Dreamhost, but on cloud foundry – nothing. Now, I can understand that there may be all sorts of reasons why it might not work (the apparent lack of a file system on the cloud foundry deployment, for one), but my big problem is that I have so far not discovered any way of finding what is actually wrong. An HTTP request just gives “not found” (no specific errors, stack traces or anything useful). Typing vmc logs MYAPP correctly shows the bundled gems being loaded, and the incoming HTTP requests reaching WEBrick, but no errors or other diagnostic output. I can only assume that the auto-configuration for a Sinatra app has not worked for my app, but there seems to be no way of finding out why.

To me, this lack of debuggability is the single biggest problem with cloud foundry. I hope it is just that I have not found out how to do it. If there is really no way at all of finding out what is going on on the virtual server we are back to “suck it and see” guesswork, which is so bad as to be unusable. I am simply not willing to spend hours (days? weeks?) changing random bits of my code and re-deploying to see if anything works.

If anyone reading this knows a way to find out what cloud foundry expects from a Sinatra app, and how to get it to tell me what is going on, please let me know. If not, I may have to abandon using cloud foundry for this project, and that would be a real shame.

The 2012 JavaZone video is out and it’s absolutely brilliant

The JavaZone conference has a reputation for the quality and cleverness of its promotional videos, but this year’s takes it to a whole new level.

Don’t watch this if you are offended by some (OK, quite a lot of) swearing. It may be coarse, but it’s very much in keeping with the style they have chosen.

4 minutes of pure movie deliciousness.

JavaZone 2012: The Java Heist