Do Not Push to Red Builds

July 8, 2017

[agile] [testing] [devops]

There's a pattern to disasters: When you examine the post mortems, its not just one failure, but a cascade of multiple failures that overwhelm the system's safety features. Pushing to a red build risks piling one failure upon another.

One of our goals is to maintain an environment where we can deploy from master at any time. The logical conclusion of this goal is continuous deployment, but we’ll save that for another day. For now, let’s examine testing and why ⊕A red build is when we push to our source repository, which kicks off a continous integration run, the “build,” which fails for one reason or another. In a perfect world, we would be able to build everything at our desktop, but as our projects grow, we break up our builds into smaller, isolated, pieces, to keep things fast. pushing to a red build is an anti-pattern.

Red-Green-Refactor is a fundamental tenet of our development methodology. It provides us with constant feedback. It warns us when things go wrong. Green tests give us a place of safey from which we can explore without fear. Like all things, it is a fractal concept. A broken build is a red unit test, writ large. Without a green build, we can not deploy. Without a green build, we can not know if the next change we make is “good.” When we push a build to the repository, we are signalling to our team that we’re making progress toward our goal, but a red build says the opposite. When you push to a red build, unless you’re pushing the fix, of course, we can’t know whether your new code is making things better or worse.

Cowboy Bobbi pushes a seemingly innocuous change to master and goes to lunch. The build goes red. Agile Alli, not noticing the red build, merges her branch to master (tests were green on the branch), too. Now, the build is still red, but some of Alice’s tests are going red, now, too. Alice starts scratching her head and wonders how things could go so terribly wrong. Meanwhile, there’s a problem on production and DevOps Danni creates a production hot-fix and deploys that.

Stormy Skies, http://archive.boston.com/bigpicture/2010/07/stormy_skies.html

When the build goes red, your first priority is to fix it. All other tasks are secondary. This gets you back on track.

In theory there is no difference between theory and practice. In practice there is. You will, at some time or another, have to use some judgement. An intermittently failing build is a code smell, but you might not be able to fix it. You’ll have to choose which is more valuable given the risk.

References

How Complex Systesms Fail