You are hereA New Chore for Sisyphus / Chapter 2 - Porcine aerodynamics

Chapter 2 - Porcine aerodynamics


By DaveAtFraud - Posted on 05 July 2010

RFC 1925-3: With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.

There are many motivations an organization might have for embarking on a death march development effort. Regardless of the specific motivation for any particular effort, there seems to be an opinion among both management and developers that, if the project succeeds, it will have been worth the cost. This belief, born of necessity, seems to be based on multiple misconceptions. First and foremost among these misconceptions is the belief that a death march will yield an acceptable product faster than applying a defined methodology whether it be predictive or agile. In addition, there is the belief that the cost of such an effort consists only of the immediate costs. These immediate costs include overtime or promised bonuses, possibly providing on-site meals or other amenities for the development team, some “burn-out” for those involved in the effort and, at worst, a recognition that the “piper will need to be paid” in the form of staff turnover, postponed vacations and undertime for the development team1.

The personnel costs of a death march development effort are only the tip of a proverbial iceberg. The practices and the implicit processes that are inherent in a death march exact a cost both on the product that is the immediate result of such an effort and on the basis or foundation that the product is intended to provide for future versions or variants. These practices and processes are driven almost exclusively by lack of sufficient time as reflected in the proposed schedule. That the hoped for schedule will not be achieved is a sad irony given the near-term and, especially, the long-term consequences of a highly compressed schedule.

Both the project's immediate probability of success2 and the long-term consequences are driven by the extent to which the schedule has been compressed. These consequences are strictly due to attempting to force developers to produce a product on an unrealistic or impossible schedule and are expressed directly in the practices of the development group. At some point developer decisions as to how to accomplish certain activities end up being driven not by what the developer knows are the correct software engineering practices but, instead, by what can be done in the time available. Obviously, these are not the practices that the group would consciously adopt unless under extreme duress. The short-term thinking and extreme focus on completing the problem at hand that are inherent in these practices yields a low quality, hard to maintain, insecure, unreliable, unstable and inextensible product.

Don't apply Parkinson's Law to engineering

One of the excuses given for compressing a development effort schedule is Parkinson's Law3, which is often stated as work expands to fill the time available. Even if this were true for software development efforts4, there is nothing that states that the converse is true. Despite all evidence to the contrary, management clings to the belief that, by severely limiting the time available for an effort, the work to be accomplished can be evenly scaled down to meet the time available. Since it will be obvious if some functionality is not completed, the only way a given level of functionality can be created within the constrained available time is by degrading the quality of the final product. This lower quality initially shows up as the number of bugs found during product test but the depth of the rot goes far beyond the superficial bugs that can be found by end-item testing.

Arbitrarily compressing a development schedule typically results in a fairly uniform distribution of superficial bugs. This only indicates that the development team was evenly affected by the compressed schedule. But not all development activities are equally important. Management sees these superficial problems as they are found during product testing and typically chalks up the schedule delay implicit in finding and fixing them as the only cost of the compressed schedule. Unfortunately, design errors embedded in the project's core logic and conceptual misunderstandings between developers result in systemic problems that cannot be easily found by testing. Such deeper, more fundamental errors will typically only show up during customer use or when the team attempts to build the next revision of the product. The expediencies of the moment unconsciously adopted by the development team frequently come back to haunt them when accepted software engineering practices are ignored.

The litany of (software development) sins

Looking back at the project described in Chapter 1, consider the following common practices of a development team driven, as one developer succinctly phrased it, by the tyranny of the urgent. The list of practices is ordered by the time frame in which each practice appeared. Thus, the analysis starts with practices that are associated with activities that typically occur at the beginning of such a project. Since no phase of the project is immune to short-term thinking, the analysis continues through those activities and practices that will only be seen as heroic efforts are expended in an attempt to finally ship and then maintain the project.

Obviously, not every practice will occur on every project that has a highly compressed schedule. Not all projects require all of the practices that are associated with a predictive software development methodology. Projects that are amenable to an agile development methodology may require no more than identifying a design pattern, collecting user stories, creating test plans and then writing the code. Unfortunately, projects that aren't amenable to an agile development methodology exist and the development effort described in Chapter 1 was one of them (more on this subject in Chapter 6). The development team made some noises regarding following a traditional, predictive methodology so this is the sieve used in examining this effort and how and why it failed.

Besides the nature of the project, other factors may result in these practices not showing up on a particular project. Outside factors may shorten a reasonable schedule part way through a project or relax the schedule constraints for an initially highly compressed project. Also, different developers will resist these ill practices to a greater or lesser degree depending on both their individual personalities and how burnt out they are when such a choice must be made.

In this analysis, the near-term consequences are those that affect development of the current product or project. These practices apparently allow a near-term milestone to be met while putting off and making more difficult the inevitable work that must be completed to actually solve the problem at hand. The long-term consequences eventually become apparent when users attempt to use the project or the development team attempts to build the next version of the project on this unstable foundation. In this analysis, more detail is provided for the consequences of the early practices since their effects are deeper, more insidious and least likely to be found through end-item testing.

Management demands to immediately start coding

Novice programmers are notorious for wanting to immediately start coding before doing any analysis of the problem or coming up with a design or approach to implementing a solution to the problem. When project managers demand that the development staff immediately work overtime and start coding, it is a clear indication that management thinks that software development consists strictly of writing code. This practice isn’t even appropriate for organizations using agile development methodologies. The collection of user stories and development of test cases should precede actually writing any significant code. Early coding tends to produce code that implements bits and pieces of the desired functionality. The code created by such efforts does not address larger program design issues such as identification of common components, dependencies between the pieces of code being developed and areas of the project needing more detailed analysis of the requirements. This approach tends to produce the software equivalent of a pile of bricks, mortar and other pieces that have no semblance to the fireplace and chimney the customer originally requested.

Early coding of well-defined, easily identified utilities is an excellent practice for accelerating almost any development effort. It is critical that any such effort meet both of the criteria (well defined and easily identified) and the development effort be approached as developing self-contained utilities that must ensure correct execution and predictable behavior under all circumstances. Unfortunately, for schedule constrained projects, the identification and specification of such utilities takes time as does their development due to the requirement for robustness. Attempts to accelerate development of application level features before the functional requirements are defined will frequently be futile and, at best, a waste of development resources as the developers attempt to implement what is not yet fully understood. At worst, the inapplicability of the code developed will haunt both the current and future development efforts.

Near-term, the result is the overall program tends to lack coherency and consistency. This occurs when two or more programmers each develop their own solution to what should have been identified as a common function. This makes testing and documenting the program difficult since similar functions will often be implemented in very different ways. Often, early code is extended to new cases later in the development cycle by the simple expedient of creating a new copy from the old and then altering the new copy, as needed (see cut-and-paste sub-classing, below).

The long-term result of these practices makes using the program cumbersome since the user is forced to learn multiple ways of performing what is, to the user, the same or a similar action. Also, since these bits and pieces of a solution are developed in a contextual vacuum, the developer may or may not make valid assumptions as to error checking requirements or valid input ranges. Incorrect assumptions for such validation logic have a way of surfacing unpleasantly in later versions of the product when additional features allow the user to perform new actions that exercise this code in new and novel (to the developer) ways.

Lack of functional requirements

A functional decomposition of a proposed project and the resulting functional requirements provide the developers and the testers with the unambiguous definition of what is to be developed. The functional decomposition allows the system to be understood at the software architecture level such that program functions with strong interdependence can be identified and the development approach can take this interdependence into account. Testable, functional requirements also form a sort of contract between the system specification author and the development organization such that, if the development team builds what is described in the functional requirements, the system is assumed to be fully implemented. Without functional requirements that document what is to be built at a testable level of detail, the development continues until someone decides that a line has to be drawn as to when the program is done. Further, without functional requirements, each developer ends up defining their own interpretation of the functional requirements without the coordinating insight that is provided by a functional decomposition. This will typically result in a product “only an engineer could love” as the developers focus on technical features at the expense of usability and functional coherency.

By foregoing the creation of functional requirements, a project does not have the benefit of this level of insight into the system at a time when the design approach and resulting implementation can easily be changed. Although it may be possible to make the system work, the lack of a coherent solution to any complex or interdependent portions of the problem will show up as:

  • broken design abstractions,
  • lack of user interface consistency,
  • ad-hoc splices and patches to make disjointed functionality somehow work together and,
  • performance issues as the burden of frequently needing to transform the data for the “next” piece of functionality creates an unnecessary load on system resources.
It should also be noted that these broken design abstractions, splices and patches provide fertile ground for additional bugs and security flaws to proliferate.

The near-term effect is each developer tests his or her code to their own, unique, imaginary set of functional requirements such that the resulting product lacks coherency and consistency from the end user’s perspective. Not developing functional requirements also means that there are possibly thorny technical issues that would be revealed by functional analysis but which will only be stumbled upon by the development team as the project progresses. Such issues include computationally complex algorithms or strong data coupling that underlies the high level functionality described in the system requirements document. Such technical issues may not be discovered until the developers attempt to write the implementing code or, more likely, when the integration team attempts to create a working whole from the disparate pieces. Long-term, the quirks of each piece become ingrained in the product such that there is little or no hope of ever achieving any level of consistency to the product or a common “look and feel” that eases user acceptance and training. The poorly thought through internal interfaces between program components become a minefield constraining future development to the extent they are understood while providing a source of latent bugs for future releases to the extent that they aren't.

Development organizations will sometimes attempt to use the system level specification in lieu of functional specifications. This seemingly plausible expedient doesn't work for a variety of reasons. In addition to the near-term effects described above, substituting the system specification for appropriately detailed functional specifications results in a conflict between those who enforce the system vision (generally the author of the system specification, customer facing personnel and the quality assurance group) and the development organization. Due to schedule pressure, the development organization will attempt to narrowly interpret the system specification as a functional specification to minimize the amount of code to be developed. The specification author and user facing personnel will attempt to point out that functional details have no business in a system level specification. A typical example is a requirement that the user be able to create something (e.g., a document or a database query) with no corresponding “system level” requirement that allows the user to delete or otherwise manage whatever was created. The author(s) of the system requirements are hounded by the developers to provide details that have no business in a system level specification only to have any answer that expands the amount of code to be developed be denounced as “feature creep.” At the same time, the QA and documentation teams attempt to create functional test plans and documentation for the emerging product only to trip over the emerging inconsistencies as they either attempt to create test cases or explain how the disjointed system is to be used.

Lack of preliminary design

Lack of a preliminary or functional design generally means that there is no valid basis for estimating the effort required or the schedule duration. Near-term consequences are a system without a coherent design that accretes functionality. Required but not obvious functionality is added where it is convenient for the development team; not where it would make sense to the end user. Intermediate-term results show up as the development process continues long beyond when it was hoped (I would not use the word “planned” in this circumstance) the development would be completed. Long-term results show up in later versions as the lack of initial design means that the overall functionality was not thought through. This results in a succession of patches, fixes and functionality splices that are applied until the product exhibits some semblance of external acceptability. Unfortunately, this can only be achieved through a series of unmaintainable quick fixes, patches in the worst sense of the word and jury-rigged splices and lash-ups.

Lack of detailed design

A detailed design allows validation and verification of what will be developed prior to coding. The near-term result of the lack of a detailed design is that interface details will be worked out only as the code on each side of the interface is completed and errors and disconnects are discovered. If the errors don’t show up during the build process (e.g., as incompatible variable types or data structures), the integration and test team or the end users must find them. The fix-ups and patches required by these errors in order to “make it work” lead directly to the long-term consequences of skipping the detailed design phase: a brittle, fragile system.

The system ends up being brittle because these fixes and patches splice one developer’s abstraction to another’s in ways that typically violate both abstractions and tie both abstractions to the patch. Fragile because the system will need to be changed and all three pieces (the patch plus each developer’s core functionality) are now easily broken.

Lack of common abstract5 objects

The implementation tends to focus on the manipulation of raw data rather than abstract objects understandable by the user or, for that matter, other developers. This is a throwback to the “old days” of software development before the concept of using abstract data types became common and results in bugs that indicate that internal values are exposed rather than an abstract interface implemented. It takes some time to define an appropriate, abstract interface to a particular functionality but the reward is a functionality that always works only the way the developer intended. The benefit being that the developers' tests can fully validate the implemented capability and there is significantly less risk that some unexpected inputs will result in abnormal execution (also known as a bug).

It should be noted that use of an object oriented programming language or even object oriented design terminology by the development team are no guarantee that a good, object oriented implementation is being created. Such a design takes time while using object oriented development terms for an implementation that is bereft of common abstractions makes everyone involved feel like modern development techniques are being used with no schedule impact (See also cut and paste sub-classing, below).

Exposing internal or unabstracted values results in small changes to one part of the program causing unexpected problems in another (e.g., An initial implementation might use the value 1 for “on” and 0 for “off” while a later version substitutes the actual strings. A low-level piece of code may rely on 0 being “off” when the program is initialized and will break when it encounters the string “off”). The near-term result is frequent, widespread code breakage during testing and integration as the raw data will not be well behaved and each piece of code that accesses the raw data will have to be armored to protect it from any badly behaved data. Also common is when a flawed abstraction gets implemented but the abstraction doesn't support the required uses by all pieces of the program. Rather than extend the original abstraction with the attendant breakage of code already written (both the function itself and any code that made use of it must be modified), the development organization frequently finds it more expedient to apply local patches to copies of the abstraction (see “cut-and-paste” sub-classing, below) that rely on the underlying implementation. The long-term result is a difficult to modify product since a change to one section of code or a data definition is not confined to a single well defined area of the code but, instead, frequently creates unexpected side effects. The result is, again, to increase the difficulty of both maintenance and developing future versions.

Continual or frequent “refactoring”

Refactoring is the agile terminology for re-writes. Refactoring is recognition that a previous approach was incorrect or not sufficiently extensible and must be replaced. Especially under the XP development methodology there are two guiding principles that make such refactorings inevitable: “keep it simple” (only implement the functionality actually required) and “you aren't gonna need it” (YAGNI or don't implement functionality that “might” be needed in a future release). Thus, the need for refactorings is expected as a result of only implementing functionality that is actually required in a given iteration.

While some “refactoring” is to be expected, repeated, non-converging refactorings of the same functionality often indicates that that functionality isn’t amenable to being developed rapidly and needs to be addressed using predictive methodologies up to and including performing a function analysis of what is being attempted. For non-agile development methodologies, experienced software developers generally do not consider re-writes to be a good thing since they have a tendency to replace one set of bugs with another. Attempting to hide multiple re-writes behind the agile terminology of refactoring means that there is an underlying thorny problem that the developer is not capable of surmounting without coordination with other contributors or, possibly, the system requirements author. There is strong likelihood that a functional analysis will be required to determine the full functionality required.

The near-term impact shows up as continued instability and lack of correct functionality in the product. This may be confined to the area subject to repeated refactoring but can impact the entire project. The long-term impact could be compared to an infected sore that won’t heal. Eventually, surgery will be required or the infection will spread throughout the project since repeated, non-converging refactorings are usually caused by complex dependencies among project components. Management should be aware of the use of agile development nomenclature within an organization that isn’t officially following an agile development methodology. Calling a re-write a refactoring doesn’t change the fact that the developer is starting over.

Cut-and-paste sub-classing

This is the practice of replicating code to avoid the schedule impact of designing a generalized solution. The near-term result is a proliferation of more or less identical bugs as each copy of the code includes any design errors and frequently some of the coding errors of the original. This practice exacts a further cost during integration and test as every element of the system must be thoroughly tested since the test team has no expectation that common functionality has been implemented with common code. The test team ends up writing numerous bug reports that attempt to get common functionality (e.g., go to next page of report) to behave the same in different locations of the program. The different copies do not behave the same because the code underlying each occurrence has been independently tweaked to reflect the local functionality or as different developers have fixed errors in the original in different ways.

The near-term result is the development group ends up repeatedly fixing the “same” bug as each copy of the code manifests bugs that were in the original code that was copied. A reusable module would allow some level of confidence to be gained by testing one occurrence thoroughly and then testing only the unique features and nominal usage at each occurrence. If the code that was replicated affects the user interface, the effort to ensure that there is a common look and feel to the interface is also multiplied by the number of times each function was copied. A corresponding change must be made to all occurrences of the code but the change to each occurrence must be carefully considered to ensure that no local functionality is impacted.

The long-term consequence is the need to maintain each individual copy of the code as changes are requested. The effort required to build future versions is multiplied by the number of times that code was replicated. Any future attempt to provide a consistent user experience by unifying these multiple copies will run afoul of the unique characteristics of each copy.

Unrealistic unit testing

Each developer creates a test environment that matches their individual vision of the system environment such that unit tests get run but with little regard for how the developer’s individual piece will play with the rest of the system. The near-term result shows up during integration and testing as the pieces of software from individual developers are added into the system and then don’t “fit.” This is usually accompanied by the developer pointing to their successful unit test results that purportedly show that what they coded is correct. Longer-term, modifications to the system will assume that the unit tests matched the behavior of the individual piece to the behavior of the system when, in fact, the test team only verified that the system behaves as expected, possibly in spite of an individual developer’s contribution.

Automatically running unit tests on the combined code base is no guarantee that this problem has been addressed. Developers frequently set input values to those values they expect. Likewise they test the output values for those they expect. These may not be what their compatriots, who coded either the upstream or downstream functionality, provide or expect. A realistic test would use the preconditions that exist by the execution of the upstream components and would test for valid post-conditions by the proper execution of the downstream component. This turns the automated execution of the unit test into what is essentially an integration test. A valid, non-trivial integration test requires an appropriate test design that considers the applicable functional requirements. Unfortunately, such a test is unlikely to be created by a developer for a unit test even under a rigorous, formal development methodology. The likelihood that such a test will be included in a developer's test cases during a death march is between slim and none.

Hiding the problem

This is a direct result of lack of abstraction. The need to hide problems arises when the raw data is so ill conditioned that some parts of the project cannot be stabilized and, thus, must simply be hidden from the user. For the project described in Chapter 1, this was accomplished by moving some functionality to a background process that could be restarted without the user being aware of the action.

The near-term result of hiding problems is continued product instability since the now hidden problem will continue to occur and create unanticipated side effects. The long-term consequence is a festering malignancy of instability remains in the product. Ultimately, the underlying cause of the problem must be found and addressed. It is also likely that the data that manifested the problem will be of interest to the user. If this is the case, the user will not be pleased to find out that valid product data was hidden to cover up a product flaw.

Lack of user interface design/consistency

If the development team does not take the time to actually design how the user will interact with the system, new functionality will be accreted onto the user interface with little or no thought as to how the end user will utilize the new functionality. This is also known as “stove-piped” or “silo” development since each piece is developed with little or no coordination between the developers.

Near-term, the consequences are additional work for both the testers and the documentation folks since they must test and document each individual nuance of the system. Long-term, the consequence is an increased load on the customer support personnel since customers must endure the same lengthy learning curve as the testers and the technical publications people. Also, users will gravitate away from the product rather than attempt to learn and remember arcane differences across such a user interface.

Extreme focus on only meeting the minimal interpretation of the requirements

The development team will only develop the 10 to 20 percent of the code that implements the most common execution paths at the expense of ignoring boundary values and outlying cases. The near-term consequences of this are fairly minimal other than the “scope creep” fights that occur as the testers, documentation folks, or the system specification author recognize that needed functionality has not been implemented. Longer-term, customers will find that they can’t utilize the system without this functionality and will impose a load on customer support looking for ways to work around the missing functionality. This isn’t to say that initial development of a capability should be put off until every oddball case has been addressed. A balance needs to be struck and the effort needs to be made to ensure that the system implemented allows the user to perform their nominal activities (e.g., a “create” action has at least a corresponding “delete” and, preferably, a “modify”).

Attempts to test quality into the code

This practice is the typical management response to the long, difficult integration effort brought about by the previously described practices. The hoped for implementation schedule was not met by the development organization and the realization eventually dawns on management that testing will not be completed in the now highly compressed remnant of the schedule. This realization first manifests itself as a call for more testers but not a longer period of testing.

Due to the quantity of deep, latent errors that the above practices embed in the product, only an unacceptably longer integration period has even the slightest hope of finding the most obvious problems. Management then attempts to apply the same solution to the test effort as was applied to development: throw people and CPU cycles at it. This approach won't work, but it is the only route available since restarting the development effort is out of the question. This additional testing will find the bulk of the superficial problems as well as a few of the deeper design issues. Although, the deeper problems only show up as particular test cases that fail, frequently the response from the development organization is that it will take a separate development effort or a complete re-write to fix the problem.

Near-term, a large amount of time-limited, superficial testing is focused on the user interface and nominal capabilities while almost nothing is done to test for design level instabilities and inconsistencies in the system. This additional testing only marginally improves the quality of the product by finding the most obvious problems. Test cases that could surface the deeper problems either require longer duration tests that might detect low frequency and low probability bugs or a functional analysis level insight into the required system in order to create these test cases. Time for such long duration tests is, of course, the one precious commodity that can't be spared. Longer-term, this approach results in an excessive number of customer reported issues as customers find the inherent inconsistencies, broken abstractions and, especially, the unanticipated data values and settings that cause the system to fail.

Attempting to test quality into the code at the end of the development cycle is the equivalent of a manufacturer relying on end-item inspection for product quality. This approach has been discredited in the “low-technology world” of manufacturing since the work of Malcomb Baldridge in the 1970s. It sadly hangs on in the supposedly “high-technology world” of software development. The shoddy quality of many software products continues to demonstrate why this approach to ensuring product quality was rejected by manufacturers of tangible goods years ago.

Attempts to automate testing of functionality still being developed

This practice first shows up when hastily assembled components are repeatedly found to be “dead on arrival” by the testers. While automated testing can provide significant efficiencies for testing both stable components and components needing repetitive testing, it is completely inappropriate for initial testing of new development. Developing test automation for new development requires applying the same thought process that the developer should have applied in developing the component. This consists of identifying the things that can go wrong and then crafting test cases that exercise the system to ensure these cases are correctly handled. Although an independent tester may be able to come up with these tests, the developer is in an even better position to do so. Further, since the developer’s unit tests should be “white box6”, the developer will have insight into both whether the final result is both correct and arrived at correctly7. Both of these are important and only the developer doing white box, unit level testing is in a position to confirm the later. A valid development process will confirm that components are correct before they leave the development organization by ensuring that developer tests accurately reflect the anticipated “real” environment.

Near-term fall-out consists of diverting effort away from needed integration testing that may actually provide useful insight into the quality of the product. If such automated testing works at all, it generally results in an undeserved optimism that the product is in better shape than it really is. This will occur because the automated tests are not able to fully exercise the system in a meaningful manner and, instead, end up traversing the same nominal conditions that the developer implemented and tested. Long-term, this puts off to later testing finding bugs that could and should have been discovered earlier. If the development organization is lucky, these bugs will be found during integration and test and will result only in delays to the product release. If the development organization is not lucky, customers will find these bugs after the release.

Automated testing of new development is only appropriate for verifying a minimal level of functionality in the software product. This is testing that really shouldn’t be necessary if the organization has a valid development process. Automated testing to ensure that changes have not broken an existing capability is a cost-effective approach to regression testing. Likewise, automated testing of repetitive actions or a large range of input values is a valid mechanism for ensuring that a system behaves correctly for all possible input values. Such testing is only possible after the code to be tested is stable.

Proliferation of patch releases

As the scheduled release date slips rapidly into the past, upper management will increasingly apply pressure to release something. This pressure will increase significantly when testing indicates that the product nominally functions as required. As noted in many of the practices previously described, the near-term consequences of these practices include lack of stability and robustness as well as performance issues. Finding the bugs which cause instabilities or the “corner cases” that cause the system to fail through system level tests is nearly impossible due to the number of precondition combinations that could possibly contribute to such a failure for any reasonably complex system. Similarly, if the system is not properly designed to begin with, performance issues can be aggravated by a host of variables including specific data cases, system settings, and the execution environment. Attempting to test quality into the product by finding such issues during system test is a long and laborious process that will frequently be dismissed by management as the pressure to release something grows.

At this point, the near-term and long-term results are the same since the project has been released. Releasing before achieving a stable and robust system that can be shown to meet performance requirements results in the need to continually release patches as customers discover that the product was not sufficiently tested. This need to continually produce patches will cause the development team to fall behind on the initial work for the next major release and gives customers the impression (probably correctly) that what was billed as a finished product is really only a partially tested stab at the desired effort. This “robbing Peter to pay Paul” diversion of effort also tends to bring out the worst practices described above for even the development of minor patches. If the next release is on the same sort of abbreviated schedule (not a bad assumption since the schedule slips caused by the above practices will compress the time available for building the next release), there is all too often pressure to “just fix it” with a quick, “good enough” patch. Development of the “real fix” is pushed out into the next release. The “good enough” patch usually isn't good enough since the next release will also experience schedule slips unless the development team manages to avoid the practices described above. This rarely happens and the short-term patch will have to continue to function for much longer than was assumed when it was created.

Judgment Day

Unfortunately, these practices are embraced because their consequences may only, possibly occur at some point in the future while “doing it right” takes more time now and there isn't time for that in the schedule. Finally and most damning, consider that none of the problems that are the result of these practices are likely to be discovered and fixed until the final stages of system integration or, even worse, after the program has been inflicted on the customer. It is only after the minor bugs and user interface issues are beginning to be resolved that it becomes patently obvious that the instabilities and inconsistencies in the system aren't due to these easily fixed, superficial bugs. Instead, the continuing instabilities and inconsistencies are due to deep-rooted, systemic design flaws that can only be corrected by re-writing major portions of the code.

Given such practices, at best a death march development effort can yield a brittle, fragile result that meets the project requirements in name only. At worst, the program never accomplishes the stated goals for the project. If such a program is released, it is never possible to find and fix all of the bugs and design errors to the point that the project performs acceptably or can provide a basis for a new revision that doesn't have all of the same problems. If the organization survives the development effort, at some point a re-write is required.

Perhaps even worse than the death march is the variation of the death march that I call the constant crisis. This is a continuous death march with the same development practices used over a longer period. The common paradigm of both processes is the absolute fixation on short-term thinking with regard to all problems since there is never enough time to “do it right.” Many of the practices described here also arise from what is called the “code and fix” development methodology even when there isn’t extreme schedule pressure. However, it takes the schedule pressure of a death march to make them commonplace enough in the development organization that these practices effectively define the organization’s software development methodology. Appendix A provides a consolidated list of the practices described above, the resulting problems and the near-term and long-term consequences for the project.

The short-term thinking of the death march and the constant crisis should not be confused with the near-term focus of agile development techniques. Agile development techniques, when properly executed, provide an iterative mechanism for reworking sub-optimal decisions or partial implementations. That is, there is a recognition that the near-term focus of these techniques will frequently result in the need to refactor elements of the design during later iterations. Agile development methodologies and the problems they can be applied to are discussed in greater detail in Chapter 6.

Purgatory

The death march focuses only on the current iteration with a “head in the sand” approach that recognizes nothing outside of the current effort nor is there any mechanism in the “methodology” for addressing the inevitable flaws that result. The constant crisis behaves similarly but with a callous disregard for the long-term consequences of the accumulated short-term thinking. Lip service is often given to the need to “fix it right” but the resources and, more importantly, the schedule required are never available.

Both the death march and the constant crisis yield a very sub-optimal, unstable product. Quite simply, these software development techniques do not work. A software development manager who willingly attempts either approach as a development methodology should be fired on the spot for gross incompetence. The probability of either method yielding a viable result is comparable to that of breaking the window on an airliner after the wings fall off and flapping your arms in an effort to keep it airborne. Regardless of the heroic effort attempted, both the near-term-result and the long-term consequences are a foregone conclusion.

Pushing the individual developers to work longer hours to keep the team size down often fails for the same reason in that the individual developers can only focus on so many tasks at any one time. Sadly for the victims, er developers, this approach can work briefly but at the cost of extreme burnout of the development team. There is still a boundary beyond which the individuals cannot be pushed since each developer can only have so much on his or her plate at any one time. Further, the number of such tasks that each developer can accomplish declines as the team members suffer from fatigue and/or staff turnover effectively increasing the size of the team8. Agile methodologies recognize this and call for a forty hour work week with only occasional overtime. Agile methodologies consider more than two consecutive weeks of overtime indicative of a broken development process.

Temptation an exhortation at a time

The practices described above arise even though the members of the development team “know better.” The driver is simply the lack of the time to do it right. As the schedule pressure for a given development effort increases, the rationale for deciding to allow or excuse any one such practice increases. Increase the pressure again, and more decisions will be made to do what is expedient in the near-term. With sufficient management pressure over a long enough period of time, the consistency of these decisions rises to the level of implementing a defined development process and the resulting product shows the consequences of being developed using such a process. This development “process” means that the current project will be buggy, brittle, unstable and inconsistent plus the ability of the team to both fix the visible problems and prepare the next release will be severely challenged.

Attempts to shorten a development project schedule beyond a certain point will only succeed when pigs can fly. Severely shortening the development schedule will rarely succeed. Worse, forcing such attempts will have significant, continuing, deep-rooted, deleterious impacts on the overall product. These impacts are predictable and not just a matter of bad luck, lack of technical skills or some other easily ameliorated factor. They are most visible with a death march since there are no methodological or managerial constraints to even partially correct these problems. Other attempts to accelerate the software development process are also limited in their effectiveness depending on characteristics of the project being attempted.


1See Peopleware by Tom DeMarco and and Timothy Lister. DeMarco and Lister define undertime as the below average productivity experienced by software development groups after a period of excessive overtime as the staff catches back up on their lives outside of work. This lower productivity does not include vacation time or other paid time off which further reduces the capability of the organization.

2Frederick Brooks, The Mythical Man-Month, Boston, MA: Addison-Wesley, 1995.

3Parkinson's Law and Other Studies in Administration by C. Northcote Parkinson. New York: Ballantine Books, 1979.

4See Peopleware by Tom DeMarco and Timothy Lister for an excellent debunking of applying Parkinson's Law to software development efforts. New York: Dorset House Publishing, 1999.

5At the lowest level, computers deal with ones and zeros. Everything else starting with characters and numbers up to abstractions like books, music files, personnel data bases, radar tracking displays, and so on are abstractions created by a program as written by the developer(s). The software to be developed in someway relates to the “real world” by providing functionality such as maintaining personnel files and generating a payroll. In the case of a payroll program, the ultimate users of the system (the employees) expect a paycheck and not a core dump.

6Both of these are important. I once ran into a beginning math student who, when asked to justify his answer of 64 divided by 16 is equal to four, said, “I just cancelled the sixes.” Programs can also give an apparently correct answer for the wrong reason.

7This effect is discussed extensively in both Yourdon's Death March and DeMarco's and Lister's Peopleware.


This work is copyrighted by David G. Miller, and is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.