The New Normal: Achievable But Not Simple
We’re ten years into the “devops + microservices” era. As I’ve consulted with companies in transformation, I've seen common patterns in those that succeed.
Adrian Cockcroft, formerly of Netflix, has talked about their development prowess. A listener asked where they found such talented developers. Adrian replied, “We hired them from you, then got out of their way.” Team-scale autonomy is impossible in most companies. Inter-team dependencies and high per-incident risks prevent it.
You can create a world where it is safe to “get out of their way.” It requires effort and creative thinking among developers, IT executives, and business executives. When designing the organization:
Anticipate and Embrace Breakage—Don’t expect everything to run like clockwork. Everything breaks. It's just a question of when. Continuous partial failure is normal. Plan for it and big changes become possible because your team will know how to handle disruption.
Aim for Antifragility—Resilience is not enough. Resilient means that the system resists shocks. Antifragile systems love uncertainty. They get better with rapid change. Evolutionary Architecture—Allow teams to create and destroy components at any time. Microservices are just one approach to accomplish this. New services “spawn” from ones that already work. Services that gain users survive, while services with few or no users should die. A service that can never be replaced is a failure.
There is No Silver Bullet—Agile development—effective at the team level—does not create antifragility. That is a property of the whole organization. Lean is good, but too much of it will make you fragile. Both are micro scale tools only.
Minimize Risk by Maximizing Change—Make changes frequently, but make each small–quick to roll out and quick to revert. Expose each change to a small audience. If it has the intended effect, roll it out broadly. If not, revert. This means business and technical metrics must be unified. Marketing cannot have one set, development another, and operations a third.
Disposable Code— Code is a liability, not an asset. It has a carrying cost: your maintenance budget. The more lines of code you have, the more risk you have. Very large code bases generate fear of unintended effects.
The Right Tools—Use tools that encourage greater modularity, loose coupling, and reduced dependencies. They should help you develop general purpose services more easily than specific solutions. Functional languages have advantages here.
Team Scale Autonomy—Eliminate or invert dependencies between systems, components, and teams. Give each team power and responsibility.
Failure Domains and Safety—Minimize the cost of incidents. Reduce the impact of any service failure. Make it quick to detect and correct. Turn hard dependencies into soft ones.
Data Leverage—Focus on the data on the wire. Use languages and frameworks that expose it, instead of locking it inside domain objects and DTOs.
Tempo—Shorten the decision loop. At the micro level, use team scale autonomy and antifragile architecture. At the macro level, refactor your organization for the next war instead of the last one.
Realizing the New Normal—This new normal is achievable but not simple. You must incorporate all these ingredients, all the time. Start small. Make small steps in each area. Iterate within a shared vision. Don't get discouraged: we all overestimate change in the short term and underestimate it in the long term.
This culminates in an antifragile infrastructure for autonomous high-velocity teams, an organization that is attentive and responsive to changes in the market, and a company that operates with greater flexibility, adaptability and at a higher tempo.