Fear is an emotion that slows teams down. It makes us more cautious. It makes us over-invest in prevention. It makes us less willing to trust, to communicate openly, and – most painfully – to take risks. It is the dominant reason I see teams fall back on “best practices” which may not be effective, but are at least reassuring. Unfortunately, these actions generally work to increase the batch size of our work, which magnifies the consequences of failure and therefore leads to more fear. Reducing fear is a heuristic we can use to judge process improvements. Anything that reduces fear is likely to speed up the fundamental feedback loop.
The interesting thing about fear is that to reduce it requires two contradictory impulses. First, we can reduce fear by mitigating the consequences of failure. If we construct areas where experimentation is less costly, we can feel safer and therefore try new things. On the other hand, the second main way to reduce fear is to engage in the feared activity more often. By pushing the envelope, we can challenge our assumptions about consequences and get better at what we fear at the same time. Thus, it is sometimes a good idea to reduce fear by slowing down, and sometimes a good idea to reduce fear by speeding up.
To illustrate this point, I want to excerpt a large part of a recent blog post by Owen Rogers, who organized my recent trip to Vancouver. I spent some time with his company before the conference and discussed ways to get started with continuous deployment, including my experience introducing it at my startup, IMVU. He summarized that conversation well, so rather than re-tread that material, I’ll quote it here:
One thing that I was surprised to learn was that IMVU started out with continuous deployment. They were deploying to production with every commit before they had an automated build server or extensive automated test coverage in place. Intuitively this seemed completely backwards to me – surely it would be better to start with CI, build up the test coverage until it reached an acceptable level and then work on deploying continuously. In retrospect and with a better understanding of their context, their approach makes perfect sense. Moreover, approaching the problem from the direction that I had intuitively is a recipe for never reaching a point where continuous deployment is feasible…
At most organizations – especially those with mature products – high quality is the starting point. It is assumed that users will not tolerate any decrease in quality. Users should only see new functionality once it is ready, fully implemented and thoroughly tested, lest they get a bad impression of the product that could adversely affect the company’s brand. They would rather build the wrong product well than risk this kind of exposure. In this context, the automated test coverage would need to be so good as to render continuous deployment infeasible for most systems. Starting instead from a position where feedback cycle time is the priority and allowing quality to ratchet up as the product matures provides a more natural lead in to continuous deployment.
Returning to the topic at hand, I think this example illustrates the tension required to reduce fear. In order to do continuous deployment at IMVU, we had to handle fear two ways:
- Reduce consequences – by emphasizing the small number of customers we had, we were able to convince ourselves that exposing them to a half-baked product was not very risky. Although it was painful, we focused our attention on the even bigger risks we were mitigating: the risk that nobody would use our product, the risk that customers wouldn’t pay for virtual goods, and the risk that we’d spend years of our lives building something that didn’t matter – again.
- Fear early, fear often – by actually doing continuous deployment before we were really “ready” for it, we got used to the real benefits and consequences of acting at that pace. On the negative side, we got a visceral feel for the kinds of changes that could really harm customers, like commits that take the whole site down. But on the plus side, we got to see just how powerful it is to be able to ship changes to the product at any hour of the day, to get rapid feedback on new ideas, and to not have to wait for the next “release train” to put your ideas in action. On the whole, it made it easier for us to decide to invest in preventive maintenance (ie the Cluster Immune System) rather than just slow down and accept a larger batch size.
Making this fear-reduction strategy work required more than just the core team getting used to continuous deployment. We eventually discovered (via five whys) that we also had to get each new employee acculturated to a fearless way of thinking. For people we hired from larger companies especially, this was challenging. To get them over that hurdle, we once again turned to the “reduce consequences” and “face your fears” duality.
When a new engineer started at IMVU, I had a simple oot-ule: they had to ship code to production on their first day. It wasn’t an absolute rule; if it had to be the second day, that was OK. But if it slipped to the third day, I started to worry. Generally, we’d let them pick their own bug to fix, or, if necessary, assign them something small. As we got better at this, we realized the smaller the better. Either way, it had to be a real bug and it had to be fixed live, in production. For some, this was an absolutely terrifying experience. “What if I take the site down?!” was a common refrain. I tried to make sure we always gave the same answer: “if you manage to take the site down, that’s our fault for making it too easy. Either way, we’ll learn something interesting.”
Because this was such a big cultural change for most new employees, we didn’t leave them to sink or swim on their own. We always assigned them a “code mentor” from the ranks of the more established engineers. The idea was that these two people would operate as a unit, with the mentor’s job performance during this period evaluated by the performance of the new person. As we continued to find bugs in production caused by new engineers who weren’t properly trained, we’d do root cause analysis, and keep making proportional investments in improving the process. As a result, we had a pretty decent curriculum for each mentor to follow to ensure the new employee got up to speed on the most important topics quickly.
These two practices worked together well. For one, it required us to keep our developer sandbox setup procedure simple and automated. Anyone who had served as a code mentor would instinctively be bothered if someone else made a change to the sandbox environment that required special manual setup. Such changes inevitably waste a lot of time, since we generally build a lot more developer sandboxes than we realize. Most importantly, we immediately thrust our new employees into a mindset of reduced fear. We had them imagine the most risky thing they could possibly do – pushing code to production too soon – and then do it.
Here’s the key point. I won’t pretend that this worked smoothly every time. Some engineers, especially in the early days, did indeed take the site down on their first day. And that was not a lot of fun. But it still turned out OK. We didn’t have that many customers, after all. And continuous deployment meant we could react fast and fix the problem quickly. Most importantly, new employees realized that they weren’t going to be fired for making a mistake. We’d immediately involve them in the postmortem analysis, and in a lot of cases it was the newcomer themselves (with the help of their mentor) who would would build the prophylactic systems required to prevent the next new person from tripping over that same issue.
Fear slows teams of all sizes down. Even if you have a large team, could you create a sandboxed environment where anyone can make changes that affect a small number of customers? Even as we grew the team at IMVU, we always maintained a rule that anyone could run a split-test without excess approvals as long as the total number of customers affected was below a critical threshold. Could you create a separate release process for small or low-risk commits, so that work that happens in small batches is released faster? My prediction in such a situation is that, over time, an increasing proportion of your commits will become eligible for the fast-track procedure.