Software Build Environments and Automation ========================================== This document considers various aspects of application build environments & their automation, including the target audiences, day-to-day use cases, formal requirements, and potential future directions of development. Target Audience =============== Before considering what an application build environment should entail, it is neccessary to understand the target audience, ie the ultimate users of the tool & how they work. The first grouping that can be easily applied is to distinguish between software developers and package distributors. The former are involved in creation of original applications, while the latter are 3rd parties who would package and distribute existing applications. Package Distributors -------------------- Package distributors for an application may exist both within the same organization as the software developers, or as external 3rd parties working idependantly on applications obtained from upstream developers. In both scenarios, however, the needs and mode of working of this group are fairly similar. * Periodically obtain releases of source code from the application's upstream development team. * Build the application into the binaries package format required for the target operating system release(s). * Build the application binary packages across all supported operating system architectures. * Over time, add patches to the package to deal with security updates, important bug fixes, integration and build issues, or more rarely feature enhancements not yet available from upstream developers. * Periodically the application will be refreshed from latest upstream release, and the patches discarded, or re-applied as required. * Integration testing of application with related packages in the operating system and across the various architectures. * One person typically the assigned maintainer of packages for several applications. The packaging of any single application is a task that may not occupy a packager full-time, major bursts of work when refreshing the package with latest upstream sources, are interspersed with minor work patching for security fixes, etc. Thus the frequency with which new package releases are created and pushed through the build systems is relatively low. Software Developers ------------------- This group ranges from the single developer, through a group of developers collaborating on an application, to a multiple groups of developers working on a set of inter-related applications. Regardless of the scale of the project, there are some common facets to the development process and developer's needs: * Maintain multiple concurrent branches of development, both maintenance stable and unstable code bases. * Provide distributable source code archive of each code base. * Provide binary packages across a handful of common operating systems, distributions, and architectures. * Rate of code development varies, with a low rate for maintenance and stable branches, and a high rate for unstable branches. * All work actively maintained and tracked through a version control repository. The key differences between the working practices of software developers and package distributors can be summed up in three points * Size of development team. A single package distributor would be assigned sole responsibilty for multiple applications, while a software developer will typically be just one member of a large team. * Rate of change. A package distributor deals with infrequent updates, perhaps months apart for significant releases, and low volume patches, while software developers are typically actively engaged in developing an application throughout their working year. * Breath of support. A package distributor will support a very focused set of operating system releases and architectures, while a software developer has to be aware that their code should be operational across an arbitrarily defined set of platforms, but not neccessary provide binaries for all variations. A consequence of this is that software developers will be agnostic to packaging format, while distributors will only need to support a single format. Build environment requirements ============================== This section of the document examines some of the requirements for software build environments The need for automation ----------------------- The idea of automation is key to any software build environment. * Repetition. In any situation where a person has to work through a number of discrete steps to complete a task, repetition of the steps will ultimately always lead to boredom. Once boredom sets in, there is a liklihood that concentration will lapse, leading to errors in process. To only way to avoid such errors, is to remove the need for a person to be involved in the process at all, by completely automating it. * Consistency. Even when presented with the most precise & detailed set of instructions, there is no guarentee that a person will apply the instructions in the way the author intended. This may be as a result of ignorance in the operation of the tools being used, belief that they know of a better way to achieve the same goal, delibrate malice on the part of the worker or simply mistakes made in reading & following the instructions. Again the only way to guarentee that a set of instructions will be consistently applied is to automate them. * Resources. In supporting a piece of software there may be a need to provide builds & test across a variety of different platforms. The combination of operating systems, distribution versions, and hardware architectures can grow combinatorially as time goes by. Beyond a handful of combinations the resources required to perform manual builds are so large as to not be viable on either time or economic grounds. As expected, automation of the build processes across all platforms provides the key to solving this problem. * Analysis. Given an automated process it is possible to bolt on additional steps and tasks which would never have been feasible, or even possible in a manual system. Prime examples, would include, running of code analysis tools for automatic bug detection, instrumentation of code to provide test suite coverage reports, generation of API documentation, provision of nightly build snapshots to 3rd-parties, and many more besides. The 'end-to-end' process ------------------------ A core feature of a successful build environment is that it be a capable of providing a completely automated end-to-end process, with no need for manual interaction at any stage. At the same time, the process should be totally self-contained, with no part of it relying on artifacts previously built and setup by a developer. Depending on the requirements of a particular development team, the process would thus include: * Start off by obtaining the 'pristine' source for all applications to be built, along with source for any supporting tools developed in house. The sources would typically be checked out from a version control system (CVS, Subversion, GNU Arch, etc), however, it is also concievable that the source would be located on a distribution site (HTTP/FTP) or a nominated directory on local disk. * Determine the order in which all modules are to be built, such that if there are build dependancies between modules, a module would be capable of building against the results of the earlier module. This provides for a totally self contained process. * Invoke the build script for each module. The build script would be a single command capable of completely building a module, and running basic the unit tests. As noted earlier, the build script would make use of the virtual root populated with the files of dependant modules. Likewise it would install its files into a virtual root, to make them available to later module's build script. The build script would also typically generate a number of binary packages, at least for the host platform, but potentially across a whole range of platforms (platform being a combination of architecture, operating system & distribution release) * The final stage would perform a number of post-build tasks. These include sending email notifications of status to interested parties; creating web status pages summarising the build results and providing access to the logs; making the packages available on a distribution site, typically along with a YUM or Apt index; handing off generated packages to another system such as an automated integration test harness. Why the 'end-to-end' process is important ----------------------------------------- To understand why it is so critically important to provide the complete end-to-end build process described above, it is worth considering some of the problems that arise when this methodology is not applied: * Developers not using source control. In a situation where developers push source code into a build system, rather than having the build system pull code in, there are no guarentees that the code used for the build will be under version control, nor that the code used is all at a consistent change list number. One consequence of this is, that it is impossible to reliably reproduce an identical build at a future date, for example, to create a bug fix release. In providing a build system which always pulls pristine source from version control, there are absolute guarentees that all source for the build can be accounted for. * Building against inconsistent dependancies. In a situation where the build system has no concept of a dependancy chain, it becomes the responsibility of the developer to ensure that the build produced was created against the correct dependant modules. A project may consist of 3 modules, a library of utility functions, and two co-operating applications using the library. If an change is made to the utility library, it is important to ensure that both applications are re-built against this new library. By setting up a build system with knowledge of the three modules and their inter-dependancies it is possible to guarentee that whenever a change is made to the library, both applications will be reliably re-built against the updated code. * Identifying breakage in dependant modules. Following on from the previous problem of building against correct dependant modules, is that of ensuring that changes in a base module, don't cause unexpected breakage in dependant modules. Consider a scenario where one team produces a utility library, while another produces applications using the library. If each team had independant build systems, only concerned with their own codebase, then an accidental API change in the utility library, may go un-noticed for many weeks or months, until the application team pulls in the new library version. With a single build system implementing an end-to-end build of all related modules, any changes in the library which break the application will come to light in a matter of hours rather than weeks. Project comprehension --------------------- The term 'project comprehension' refers to the idea of collecting, organizing, analysing, and presenting information about the software build process. Quick and efficient access to information about a project's build process is essential regardless of the size of the development team. The build environment plays a major part in project comprehension, from the generation of build logs, through execution of code analysis tools, to presentation of build output. A handful of the areas of project comprehension which fall under the build environment are * Summary of state of the build. For each code module, show whether the last compilation run was succesfull or a failure, also indicate any modules which had to be skipped as dependant modules failed. * Build logs. Provide a display of the output from all the build commands enabling analysis of any warning or error messages generated during the build cycle * Test logs. Provide a display of the output from all test suites run against modules built, highlighting any new failures since the previous run. * Code analysis. Output reports from code analysis tools, such as automatic bug detection, test coverage instrumentation, style validation tools, documentation coverage checks, copy-and-paste code detector. * Historical trends. Reports showing the history of module build failures, test reports, bug reports. * Notifications. Notification of build failures, test failures, progress of the builder through a set of modules. Integration ----------- For an automated build environment to be succesfully deployed within an team, its integration capabilities are a key aspect. Developers are typically very averse to changing their existing practices, particularly if the suggestion for change comes from people outside of the immediate development team, be they managers, or 3rd party consultants. Thus it must be easy to integrate build tools with any set of existing development tools, without requiring the entire software stack to be changed. * Language. Software projects may have components written in multiple languages ranging from C, to Perl, to Ruby, to Java, to Cobol, and a whole host of others in between. Thus is goes without saying that the build environment should not favour one particular language (or class of languages) to the detriment of the others. * Build tool. Depending on the programming language be used, a variety of low level build tools may be in use. The traditional Make tool may be used directly, or indirectly through a wrapper layer such as IMake, GNU AutoTools, or Perl's ExtUtils::MakeMaker & Module::Build. For Java it is common to find ANT or Maven in use, while other languages may have home grown capabilities, or simply leverage a shell script. The build environment harness must be capable of integrating with all of the above. * Source control. There are a whole host of systems available for maintaining code in version controlled storage. Traditional systems such as CVS, ClearCase, SCCS, RCS provide simple file oriented versioning capabilities, while modern systems such as Subversion, GNU Arch, BitKeeper and Perforce provide change set oriented capabilities. Switching version control systems is a decidely non-trivial task, so build environment harness must integrate with any it finds. * Packaging format. The file format in which software builds are distributed is heavily dependant on the target platforms of the end users. For windows, the binary InstallShield would be used, Linux distributuons are typically based around the RPM or Debian package formats, while Solaris has another format still. To avoid exclusion of any potential user base, the post-build output stages should be capable of operating with any format of package. [1] cf peopleware