Software Build Environments and Automation
==========================================

This document considers various aspects of application build
environments & their automation, including the target audiences, 
day-to-day use cases, formal requirements, and potential future 
directions  of development.

Target Audience
===============

Before considering what an application build environment should 
entail, it is neccessary to understand the target audience, ie
the ultimate users of the tool & how they work. The first grouping
that can be easily applied is to distinguish between software
developers and package distributors. The former are involved in 
creation of original applications, while the latter are 3rd parties 
who would package and distribute existing applications.

Package Distributors
--------------------

Package distributors for an application may exist both within the 
same organization as the software developers, or as external 3rd 
parties working idependantly on applications obtained from upstream
developers. In both scenarios, however, the needs and mode of working 
of this group are fairly similar. 

 * Periodically obtain releases of source code from the application's
   upstream development team.

 * Build the application into the binaries package format required
   for the target operating system release(s).

 * Build the application binary packages across all supported operating
   system architectures.

 * Over time, add patches to the package to deal with security updates,
   important bug fixes, integration and build issues, or more rarely
   feature enhancements not yet available from upstream developers.

 * Periodically the application will be refreshed from latest upstream
   release, and the patches discarded, or re-applied as required.

 * Integration testing of application with related packages in the
   operating system and across the various architectures.

 * One person typically the assigned maintainer of packages for several
   applications.

The packaging of any single application is a task that may not occupy
a packager full-time, major bursts of work when refreshing the package 
with latest upstream sources, are interspersed with minor work patching
for security fixes, etc. Thus the frequency with which new package releases
are created and pushed through the build systems is relatively low.

Software Developers
-------------------

This group ranges from the single developer, through a group of developers 
collaborating on an application, to a multiple groups of developers
working on a set of inter-related applications. Regardless of the scale of 
the project, there are some common facets to the development process and
developer's needs:

 * Maintain multiple concurrent branches of development, both maintenance
   stable and unstable code bases.

 * Provide distributable source code archive of each code base.
 
 * Provide binary packages across a handful of common operating systems, 
   distributions, and architectures. 

 * Rate of code development varies, with a low rate for maintenance and 
   stable branches, and a high rate for unstable branches.

 * All work actively maintained and tracked through a version control
   repository.

The key differences between the working practices of software developers 
and package distributors can be summed up in three points

 * Size of development team. A single package distributor would be assigned
   sole responsibilty for multiple applications, while a software developer
   will typically be just one member of a large team.

 * Rate of change. A package distributor deals with infrequent updates, 
   perhaps months apart for significant releases, and low volume patches, 
   while software developers are typically actively engaged in developing 
   an application throughout their working year.

 * Breath of support. A package distributor will support a very focused set
   of operating system releases and architectures, while a software developer
   has to be aware that their code should be operational across an arbitrarily
   defined set of platforms, but not neccessary provide binaries for all
   variations. A consequence of this is that software developers will be
   agnostic to packaging format, while distributors will only need to support
   a single format.


Build environment requirements
==============================

This section of the document examines some of the requirements for software
build environments

The need for automation
-----------------------

The idea of automation is key to any software build environment. 

 * Repetition. In any situation where a person has to work through a number
   of discrete steps to complete a task, repetition of the steps will 
   ultimately always lead to boredom. Once boredom sets in, there is a 
   liklihood that concentration will lapse, leading to errors in process.
   To only way to avoid such errors, is to remove the need for a person
   to be involved in the process at all, by completely automating it.

 * Consistency. Even when presented with the most precise & detailed set of 
   instructions, there is no guarentee that a person will apply the instructions
   in the way the author intended. This may be as a result of ignorance in
   the operation of the tools being used, belief that they know of a better
   way to achieve the same goal, delibrate malice on the part of the worker
   or simply mistakes made in reading & following the instructions. Again
   the only way to guarentee that a set of instructions will be consistently
   applied is to automate them.

 * Resources. In supporting a piece of software there may be a need to provide
   builds & test across a variety of different platforms. The combination of
   operating systems, distribution versions, and hardware architectures can
   grow combinatorially as time goes by. Beyond a handful of combinations the
   resources required to perform manual builds are so large as to not 
   be viable on either time or economic grounds. As expected, automation of
   the build processes across all platforms provides the key to solving this
   problem.

 * Analysis. Given an automated process it is possible to bolt on additional
   steps and tasks which would never have been feasible, or even possible in
   a manual system. Prime examples, would include, running of code analysis
   tools for automatic bug detection, instrumentation of code to provide test
   suite coverage reports, generation of API documentation, provision of
   nightly build snapshots to 3rd-parties, and many more besides.

The 'end-to-end' process
------------------------

A core feature of a successful build environment is that it be a capable of 
providing a completely automated end-to-end process, with no need for manual
interaction at any stage. At the same time, the process should be totally
self-contained, with no part of it relying on artifacts previously built and
setup by a developer. Depending on the requirements of a particular development
team, the process would thus include:

 * Start off by obtaining the 'pristine' source for all applications to
   be built, along with source for any supporting tools developed in house.
   The sources would typically be checked out from a version control system
   (CVS, Subversion, GNU Arch, etc), however, it is also concievable that
   the source would be located on a distribution site (HTTP/FTP) or a nominated
   directory on local disk.

 * Determine the order in which all modules are to be built, such that if there
   are build dependancies between modules, a module would be capable of building 
   against the results of the earlier module. This provides for a totally self
   contained process.

 * Invoke the build script for each module. The build script would be a single
   command capable of completely building a module, and running basic the unit
   tests. As noted earlier, the build script would make use of the virtual root
   populated with the files of dependant modules. Likewise it would install its
   files into a virtual root, to make them available to later module's build
   script. The build script would also typically generate a number of binary
   packages, at least for the host platform, but potentially across a whole
   range of platforms (platform being a combination of architecture, operating
   system & distribution release)

 * The final stage would perform a number of post-build tasks. These include
   sending email notifications of status to interested parties; creating web
   status pages summarising the build results and providing access to the logs;
   making the packages available on a distribution site, typically along with
   a YUM or Apt index; handing off generated packages to another system such
   as an automated integration test harness.

Why the 'end-to-end' process is important
-----------------------------------------

To understand why it is so critically important to provide the complete end-to-end
build process described above, it is worth considering some of the problems that
arise when this methodology is not applied:

 * Developers not using source control. In a situation where developers push 
   source code into a build system, rather than having the build system pull 
   code in, there are no guarentees that the code used for the build will be
   under version control, nor that the code used is all at a consistent change
   list number. One consequence of this is, that it is impossible to reliably 
   reproduce an identical build at a future date, for example, to create a bug
   fix release. In providing a build system which always pulls pristine source
   from version control, there are absolute guarentees that all source for the
   build can be accounted for.

 * Building against inconsistent dependancies. In a situation where the build
   system has no concept of a dependancy chain, it becomes the responsibility
   of the developer to ensure that the build produced was created against the
   correct dependant modules. A project may consist of 3 modules, a library
   of utility functions, and two co-operating applications using the library.
   If an change is made to the utility library, it is important to ensure that
   both applications are re-built against this new library. By setting up a
   build system with knowledge of the three modules and their inter-dependancies
   it is possible to guarentee that whenever a change is made to the library,
   both applications will be reliably re-built against the updated code.

 * Identifying breakage in dependant modules. Following on from the previous
   problem of building against correct dependant modules, is that of ensuring
   that changes in a base module, don't cause unexpected breakage in dependant
   modules. Consider a scenario where one team produces a utility library,
   while another produces applications using the library. If each team had 
   independant build systems, only concerned with their own codebase, then an
   accidental API change in the utility library, may go un-noticed for many
   weeks or months, until the application team pulls in the new library version.
   With a single build system implementing an end-to-end build of all related
   modules, any changes in the library which break the application will come
   to light in a matter of hours rather than weeks.


Project comprehension
---------------------

The term 'project comprehension' refers to the idea of collecting, organizing,
analysing, and presenting information about the software build process. Quick
and efficient access to information about a project's build process is essential
regardless of the size of the development team. The build environment plays
a major part in project comprehension, from the generation of build logs, through
execution of code analysis tools, to presentation of build output. A handful of
the areas of project comprehension which fall under the build environment are

 * Summary of state of the build. For each code module, show whether the last
   compilation run was succesfull or a failure, also indicate any modules which
   had to be skipped as dependant modules failed.

 * Build logs. Provide a display of the output from all the build commands 
   enabling analysis of any warning or error messages generated during the
   build cycle

 * Test logs. Provide a display of the output from all test suites run against
   modules built, highlighting any new failures since the previous run.

 * Code analysis. Output reports from code analysis tools, such as automatic
   bug detection, test coverage instrumentation, style validation tools,
   documentation coverage checks, copy-and-paste code detector.

 * Historical trends. Reports showing the history of module build failures,
   test reports, bug reports.

 * Notifications. Notification of build failures, test failures, progress of
   the builder through a set of modules.


Integration
-----------

For an automated build environment to be succesfully deployed within an team,
its integration capabilities are a key aspect. Developers are typically very
averse to changing their existing practices, particularly if the suggestion
for change comes from people outside of the immediate development team, be
they managers, or 3rd party consultants. Thus it must be easy to integrate 
build tools with any set of existing development tools, without requiring the
entire software stack to be changed. 

 * Language. Software projects may have components written in multiple languages
   ranging from C, to Perl, to Ruby, to Java, to Cobol, and a whole host of
   others in between. Thus is goes without saying that the build environment
   should not favour one particular language (or class of languages) to the
   detriment of the others.

 * Build tool. Depending on the programming language be used, a variety of
   low level build tools may be in use. The traditional Make tool may be
   used directly, or indirectly through a wrapper layer such as IMake,
   GNU AutoTools, or Perl's ExtUtils::MakeMaker & Module::Build. For Java it
   is common to find ANT or Maven in use, while other languages may have
   home grown capabilities, or simply leverage a shell script. The build 
   environment harness must be capable of integrating with all of the above.

 * Source control. There are a whole host of systems available for maintaining
   code in version controlled storage. Traditional systems such as CVS, ClearCase,
   SCCS, RCS provide simple file oriented versioning capabilities, while modern
   systems such as Subversion, GNU Arch, BitKeeper and Perforce provide change
   set oriented capabilities. Switching version control systems is a decidely
   non-trivial task, so build environment harness must integrate with any it
   finds.

 * Packaging format. The file format in which software builds are distributed
   is heavily dependant on the target platforms of the end users. For windows,
   the binary InstallShield would be used, Linux distributuons are typically
   based around the RPM or Debian package formats, while Solaris has another
   format still. To avoid exclusion of any potential user base, the post-build 
   output stages should be capable of operating with any format of package.

[1] cf peopleware