The work in progress (WIP) concept originated in the Lean manufacturing movement. In both a financial and operational sense WIP is considered a form of waste. It’s work that has been done and paid for but is providing no benefit to the customer or the company.
WIP is an even more insidious problem in software development. In addition to financial and operational complications, it imposes an additional tax on a software engineering team in the form of increased complexity and cognitive load.
WIP in Manufacturing
Often, the ideal process for both manufacturing and software is ‘single-piece flow’. This means that once work begins, it continues without interruption until value is delivered to a customer. With this approach, WIP is reduced to its theoretical minimal level.
Consider what happens in the absence of single-piece flow. Some value-adding work has been done on a product, but then it has to stop for a while before the product is finished. Maybe required materials are not present and the product must be stored until they arrive. Perhaps the product needs to be moved to another location before value-adding work can begin again. The time and expense involved in moving the product is a form of waste.
Partially finished items need to be kept track of, moved, and stored until they can be completed and delivered to customers. The extra effort to do this provides no value to the customer or the company and is wasteful.
WIP in Software Development
The issues with partially finished work are similar in software development. In this context WIP is source code or other resources that have been created but are not yet in production where they deliver value to users.
So, WIP might exist as code on an engineer’s laptop that is not committed to a repository, code committed to a repository but not yet in a DevOps pipeline, or code in a DevOps pipeline waiting to be deployed. Either way it is waste: time and money has been spent on something that is not producing value for the customer.
A software development team must spend precious cognitive bandwidth managing WIP. This takes effort and reduces the amount of time and energy that can be spent delivering features that provide value to the customer. In software development, as in manufacturing, it is very important to keep WIP at a minimum.
Cognitive load is a concept introduced in 1988 by psychologist John Sweller. It recently gained more popularity when Matthew Skepton and Manuel Pais published Team Topologies: Organizing Business and Technology Teams for Fast Flow.
In software engineering there are three main ways cognitive load or mental capacity can be spent:
- needless tasks that provide limited value such as manual deployments or paperwork,
- well-known foundational elements of the current task like knowledge of a particular programming language or commonly used libraries, or
- higher level ‘value add’ thinking about architectural concerns or code-level tactical decisions like API design.
A team’s capacity to handle cognitive load is very limited. Great care must be taken to ensure the most time is spent on higher level ‘value add’ thinking.
High levels of WIP mean engineers are forced to work on more than one item at a time. Often a great many items at the same time. Paying attention to all these tasks increases the cognitive load for both individuals and the overall team. It also increases the complexity of the tasks the team must handle. Excessive WIP can cause team performance and morale to deteriorate badly.
On the other hand, low WIP implies each engineer can focus on just a few (or ideally, one) thing at a time. The ideal state is that an engineer begins a task with everything needed to complete it so they can work uninterrupted until the task is done. There may be a small pause at this point until the code passes review, then it enters a build pipeline and again continues without interruption until it is released into production.
This seamless and constant flow of code reduces cognitive load on individuals and the overall team. Limiting WIP shortens the lead time for new features, reduces context switching, and allows engineers to concentrate resources and attention on the most important tasks. When engineers focus on only one thing at a time, they are more productive and generally more satisfied with their work. They are also more likely to stay with you.
What happens if automated testing and deployment of code changes is not in place? Then that code needs to be stored somewhere (often long-running release branches) while waiting for approval. The team needs to keep track of what features are in what branch and be aware of any interactions between branches. In practice, there are often quite a few interactions between long-running branches which makes testing difficult. The context switching involved in this approach imposes a high cognitive load on a team and greatly reduces its capacity to provide value to the customer and company.
The Wrong Branching Model Can Increase WIP
Gitflow vs Trunk-based Development
Many software engineering teams use long-running release branches, sometimes multiple release branches. Releases are infrequent and the idea is to work on multiple features at once, scheduling them for multiple release dates.
There may be valid business reasons for this approach, and if that is the case not much can be done. However, long-running release branches are usually in place simply because a team does not know how to deliver more frequent, smaller releases.
By definition, each long-running release branch contains WIP. Engineers have to keep track of the features contained in each release branch while also developing new ones. This structure creates distraction and requires frequent context switching. It can be very error-prone.
Using the right branching strategy is critical to minimising WIP. Branching models like Gitflow split the codebase into several long-running branches, each designated for a future release. (See this post on the topic by software engineering blogger Vincent Driessen).
The cognitive load on a team using Gitflow tends to be high. They have to remember what each branch is for, what features are included in each branch, and how code in each branch interacts with other branches. There are many checks as code winds its way to production.
Gitflow is the right tool for certain use cases, but the payment for all the protection it offers is very large, infrequent releases, and large amounts of WIP.
To reduce WIP (and increase deployment frequency/decrease deployment size), take a look at Trunk-based development. With this practice, developers merge small, frequent updates to a core main branch. Trunk-based development has strong support in the DevOps community as it streamlines merging, reduces WIP, and reduces cognitive load.
How WIP Limits are Used
Many teams use limits to reduce WIP and its associated waste and cognitive load. By focusing on an increasingly smaller number of tasks (the theoretical ideal being one task), WIP limits are used to reduce rework and duplicative efforts, eliminate the context switching that comes with juggling multiple tasks, cut down on unnecessary meetings, improve communications and eliminate handoffs between teams or parts of teams.
What is a WIP Limit?
A WIP limit simply caps the number of unfinished tasks an engineer or team can work on at the same time. The optimal WIP limit is one, meaning an engineer works on only one task until it is done. Once the task is complete, the engineer switches to the next task. The goal is to minimise (and ultimately prevent) multitasking and the context switching that happens each time an engineer stops working on one task and picks up another.
In many cases the rule is that once a WIP limit is reached, any additional work performed by engineers only involves non-coding tasks like code reviews, training, etc.
Reducing WIP to a bare minimum is an important goal in any DevOps or other workflow transformation. However, WIP limits must be used very carefully or they can backfire. Here, we provide a quick background on the concept of WIP limits with guidance on how to best use them.
In an ideal workflow, an engineer would work on one item until it is completed and passed off to an automated CI/CD pipeline that tests and deploys the code. The engineer would then begin work on the next item without delay.
Using WIP Limits for a Scrum Team
The illustration below shows a simple and hypothetical development team scrum board. Stories in the left column have not been started, the middle column contains stories in progress, and the column on the right is where completed stories are placed.
WIP limits deal with items in the “In Progress” column with the goal of reducing the number of tasks being simultaneously worked. Sometimes WIP limits apply to the entire team, but they can also apply to individuals. In either case the objective is to reduce the amount of WIP and all the negative effects discussed above.
Where WIP Limits Work — and Where They Don’t
WIP limits are often triggered when the entire engineering workstream is not set up for optimal flow. The typical symptom of non-optimal flow is when work on a story is partially completed, but then has to stop while the engineer waits for clarification on requirements, waits for approval to proceed further, or has to drop their work to focus on an important production defect.
If there is a WIP limit of one the blocked engineer is idle, they can’t work on anything else until the blockage is cleared. Idle engineers are obviously wasteful and few teams in this situation end up using WIP limits for long, even if the idle time is used for training or other productive tasks.
An increase in idle engineers often ends up being the fatal flaw in using WIP limits to spur process improvements. Idle engineers are usually only tolerated if:
- A company is willing to let engineers be idle for possibly lengthy periods of time until the workflow is improved to the point where WIP limits are not triggered.
- The engineering team using WIP limits controls the parts of the process that need improvement. If the delays triggering a WIP limit are the result of another department (possibly QA or Operations) then the fact your team has reached a WIP limit may not matter to them. It’s possible these other groups might not be motivated to improve their pieces of the process so your team’s WIP limits are not triggered.
Few companies will tolerate idle engineers for long. When the head of engineering complains that their engineers are idle due to WIP limits being triggered because testing is not automated, the first question the group VP will ask is “Why can’t they work on something else while they wait?” Be careful how you answer this question.
Arguing that WIP limits are good because they reduce cognitive load and stress might not play well with stakeholders. One can always reduce cognitive load and stress simply by being idle. Instead, the goal is to accomplish quite a bit while keeping cognitive load and stress in check. You don’t want to be on the wrong side of this argument.
Leading a Transformation
If you are leading an agile or DevOps transformation (or the second, third or fourth attempt at a transformation) you need to be aware of organisational history and internal politics so you can manage how your team presents itself across the company. If past transformations have failed or only been partially successful, the organisation may be skeptical. There is likely a long backlog of engineering work that the business would have liked to be completed long ago.
Stakeholders want engineers working to reduce that backlog. When stakeholders learn WIP limits are reducing the amount of time engineers spend coding (even if only temporarily), precious political capital will need to be spent to keep them supporting the transformation. Why incur that expense? You’ll need that political capital later, maybe if your team falls behind schedule or if a serious defect is released to production. Nothing ever goes as smoothly as you think it will.
We have found it more effective and less politically risky to challenge a team to improve DORA DevOps metrics instead of implementing WIP limits. Focusing on the improvement of deployment frequency along with change failure rate is often a good place to start. If your team is currently deploying once per month, challenge them to deploy every two weeks, then weekly, then at will. And do that without increasing the number of defects found in production.
From the DORA 2021 Report:
Improving DORA metrics is a tangible task the team can focus on immediately. It will provide tangible benefits to the company. Tools like value-stream mapping can be used to eliminate waste in any process. Remember, optimised processes minimise WIP. Accomplish just this and you’ll be making a substantial contribution to the good of the company.
The exception to this rule is if your team does not control the entire process. Maybe they are assigned to help another team by working on backlog items and they don’t have any control or input into deployment frequency. In this case WIP limits might be helpful, at least as a tool to help other teams understand they need to improve their processes.
The Canary in the Coal Mine
WIP limits do have one great advantage. A triggered WIP limit is often the first indication that a high performing engineering team is beginning to backslide. If engineers start hitting WIP limits it means your DORA metrics are soon to decline and you have a problem to solve. A good approach is to set a WIP limit just above what the normal WIP level is. In this sense WIP limits are like the proverbial canary in the coal mine — an early warning system.
To summarise, if you set a WIP limit before flow is optimised you are bound to have engineers spend substantial time on non-engineering tasks. No one likes that. Engineers don’t like that and your business stakeholders surely won’t like that.
You can argue the slack time produced by WIP limits will be spent on other important tasks like training. That may be true, but we think this is a bad way to manage important tasks. Your team will never know when the WIP limit will be triggered and when they can spend time training, or when WIP will drop under the WIP limit and they have to get back to engineering work. This seems like a haphazard approach to something like training, which is an important part of your team’s professional development.
Building Software that Thrives on Change
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change.
Charles Darwin
In Sourced’s experience, engineering organisations can be broken down into three groups:
- Base-level engineering organisations employ many manual processes. Performance declines rapidly under a limited amount of change or load. Their capacity for improvement is small and they can only recover to baseline performance with extreme difficulty.
- Effective engineering organisations produce good quality software while enduring a moderate amount of change or load. If stress exceeds a certain threshold, performance rapidly declines. These organisations use some automated processes while still depending on manual work for important tasks. Their capacity for improvement is modest and they can recover to baseline performance with moderate difficulty.
- Elite engineering organisations grow stronger as more change or load is applied to them, in the same way the human body grows stronger through rigorous exercise. They actively seek to learn from mistakes and incorporate those learnings into future work, often embracing cloud infrastructure and cloud-native architectures. These organisations depend heavily on automation to increase quality while reducing cognitive load on teams and individuals. Their capacity for improvement is great and under stress they can recover to baseline performance with relative ease.
Getting agile and DevOps transformations right is critical to becoming an elite engineering organisation.
Mike is a Managing Principal Consultant at Sourced Group. Over the last 25 years Mike has worked as an engineer, architect, and leader of large engineering organizations.