Introduction to my PhD Research

It’s the nature of research, and particularly of doctoral research, that approaches change and details become clearer only over time. With that in mind this is an introduction to the topic area of my research as I see it now, right at the start of the process, borrowing heavily from the proposal document I submitted to the university as part of my application.

For now, my working title is “A comparative analysis of template languages for text generation“. This is a surprisingly broad area, and I’m sure it will be narrowed down to a more precise research question as things progress.

Almost every use for computers has a need to produce textual output. Not just fixed, hand-crafted text but also such text combined with variable content. As an example, most business software comes with some sort of “mail merge” facility for the “small print” on an invoice; a form letter with a bit more individuality than “Dear customer”, and junk email offering supposedly unbeatable personalised offers.

The most common way of producing these kinds of documents uses a templating technique. A master document containing blocks of fixed text and special symbolic tokens (sometimes known as “placeholders”) is processed through a software system which combines the supplied text with selected data records to produce a set of similar, but individualised, documents. In each resulting document the placeholders have been replaced by appropriate values from the data.

Templating is not limited to personalised mail, however. Large swathes of the visible pages of the web are produced in this way. This very blog, for example! The content of the post you are reading exists in a database, along with the list of posts for the archive section and so on. The structure of the page, where to put the headers and widgets, and any other “boilerplate” are in a single template used for every blog post. Placeholders in the page templates show where the specific blog text, title and so on should be inserted.

Templated text generation is also common in less visible internet traffic. Emails and other textual messages, logging, diagnostic output, code generation and many data interchange formats and protocols, all benefit from this powerful technique. The separation of fixed common format from the variable parts of the data enables the two to be produced independently, often at different times, by different teams or departments.

All these uses of templating have some common character but beyond this superficial similarity lies potentially thousands of differing implementations. For example, there are:

  • templates used on servers and in a web browser
  • some which use a single master document, while others can select from alternatives or combine many document fragments)
  • some which merge single values, flat records, structured data or fetch values from remote systems as required
  • some which are tied to a specific document format, language or character set, and some which can produce arbitrary data output
  • some which concentrate on replacing single tokens, while others contain programming constructs such as loops and decisions
  • some template engines which stand alone, while others can only be used within a larger framework
  • some contain all their own processing tools, and some which hand off placeholder values to a separate programming language
  • template languages with a formal syntax, and those comprised of ad-hoc or extensible components

As well as these operational characteristics there is also a wide range of implementation quirks. There are template systems for just about every programming language, and some languages have many to choose from. Even within such groups there are implementations with wildly differing performance and resource usage, as well as (for example) the treatment of line breaks and other “white space”.

In my many years in the software industry, I have used a lot of template systems, and even written a few, but it has become increasingly apparent that this field is fragmented and divisive. Developers of every new programming language, server, tool and framework feel compelled to enter the fray with another slightly different template system, yet the same naive approaches and uninformed choices continue to be created and promoted as if they are something new.

Even choosing a template solution for a project has becomes a significant issue. Comprehensive comparative information about the many differing implementations can be hard to come by. The Wikipedia page on the subject lists over one hundred template engines but is far from complete. Software documentation for such tools, where present at all, tends to focus on the use of a single implementation, usually ignoring or sidelining missing capabilities, and avoiding direct comparison with alternatives from other providers.

This is all compounded by the lack of a theoretical foundation on which to base comparison and inform discussion. My aim is to produce such a foundation.