Measuring and improving the productivity of DevOps teams has long been a major challenge. From purely technical metrics to more business-oriented ones, the choice of metrics often remains a concern, especially since it can end with completely obsolete choices. In the era of the generalization of DevOps practices, standards are being developed for almost every practice, and the same is true of DevOps metrics.
Usually, the productivity of a DevOps team is considered a measure of its ability to efficiently write and deploy high-quality software that works well and is easy to maintain. From a business perspective, companies want to more accurately measure the productivity of their development staff for such reasons as:
Essentially, companies are always looking for binding metrics that help them identify employee behaviors and ultimately motivate employees to improve performance. At the same time, companies also want metrics that can justify the investments made.
At first glance, measuring DevOps productivity seems like a simple task. Unless, as we can see, you want to do it well. In that case, it is very difficult. To illustrate this, let’s look at some of the traditional metrics used by many companies to measure developer productivity. These include, among others:
It is a common misconception that there is a simple and effective way to measure the productivity of a DevOps team with a very good metric. In reality, no single metric can be used to evaluate productivity, as it depends on far too many factors that can vary across industries, organizations, and specific cases. In practice, a weighted combination of several metrics may be used, but even then the metrics and specific weightings will vary from case to case.
Also, the claim that productivity is primarily determined by the individual is a myth. Of course, individual performance has great value in its own right, but ultimately the contribution to the team as a group-based activity is crucial. It is well established that project success does not only depend on the independent success of individuals. It also depends on how well the DevOps team works together. In fact, the whole is more than the sum of its parts.
We must differentiate between specific metrics and frameworks. Metrics are nothing more than concepts that we measure, which, without context, are meaningless numbers. A framework is a guide to how a series of metrics, which you do not have to apply to all, can be established and used in a certain context to understand how a team is doing and what actions to take to get to the point you want.
One of the most common mistakes is to take a framework and apply all the metrics. A framework is a simple summary of ideas, concepts, and relationships so that based on your context, environment, and teams, you can select a set of metrics that you can interpret to determine whether the team is where it should be.
Next, we will examine three of the most popular frameworks for measuring the productivity of a DevOps team.
DORA is a set of metrics that aims to measure the performance of software engineering teams in agile and DevOps development environments. In short, this approach is based on more effective data and can objectively assess the performance of DevOps teams.
The metrics emerged from the work of DORA (DevOps Research and Assessment), an organization founded in 2014 by academic researchers Nicole Forsgren, Jez Humble, and Gene Kim. In 2018, Google, acquired the organization and incorporated its research and metrics into its Cloud platform.
The institution’s goal is to conduct research and evaluations in software development companies to identify the most effective practices and key metrics for evaluating team performance.
The “State of DevOps Report,” a result of DORA’s research, is widely used worldwide and has become a reference in the industry. With more than 30,000 participants since its creation, it offers valuable insights to improve software development and operations processes.
The DORA survey evaluates the performance of engineering teams in software development using four main metrics, divided into speed and stability.
While these metrics are useful, they should not be treated as the main goal. The focus should always be on building quality software – efficiently and usefully for users – in addition to generating value for the business, of course. That said, let’s delve deeper into the metrics that will support the entire journey of building digital products.
Speed Metrics
Deployment Frequency
This measures how often a team deploys code to production. A high deployment frequency can indicate that the team can deliver value to users continuously and quickly, responding to market demands and adjusting the software according to user needs. DORA research classifies Deployment Frequency into three categories:
Lead Time for Change
This measures the time it takes from the moment a code change is made until it is successfully deployed to production. A shorter lead time may indicate that the team can deliver new features or fixes quickly and efficiently, responding quickly to market changes and user needs.
The DORA survey also classifies Lead Time for Changes into three categories:
Stability Metrics
Change Failure Rate
This metric measures the proportion of deployments that result in failures, such as incidents in production or the need to roll back changes. A lower failure rate may indicate that the team can manage risks and maintain software quality while deploying changes, which is essential for a truly efficient squad.
The DORA survey classifies Change Failure Rate into three categories:
Mean Time To Restore
This metric helps assess team’s ability to handle incidents and failures. It measures the average time it takes to restore service after a failure, indicating how effectively the team can identify, diagnose, and resolve issues.
DORA research classifies Mean Time to Restore into three categories:
In short, DORA Metrics are unique because they are not limited to just delivery speed, but also encompass the operational stability of the software over time. This ensures that the quality of the software is not compromised after initial delivery. Furthermore, it allows the performance of the software to be closely monitored in terms of stability, reliability, and other quality metrics.
As for validity, the metrics make sense, but they do not cover all aspects of productivity. Many aspects are left out, such as developer satisfaction and related indicators. The authors of the original study themselves mentioned in some interviews that these metrics are incomplete and could be expanded. This led to the creation of SPACE.
The SPACE framework is a set of metrics designed to provide a holistic view of software engineering team performance. Unlike traditional approaches that may focus solely on speed or output, the SPACE framework emphasizes multiple dimensions to capture the overall productivity and well-being of engineering teams. Developed by researchers and practitioners in the field, it aims to address the complexities of modern software development environments, especially within agile and DevOps practices.
The main idea is that assessing productivity goes beyond measuring a single dimension, and there are several associated myths. Below, I will briefly discuss some of them:
Myth: Productivity is only about developer productivity
Large workloads can result from a variety of factors, such as excessive hours due to inefficient systems or inadequate planning. Activity metrics alone cannot reward or penalize because they lack context. Simple metrics like commits or code reviews can also be error-prone and do not account for activities like pair programming. Additionally, overtime due to tight deadlines due to poor planning or a detrimental culture can compromise productivity assessment.
Myth: Productivity is only about individual performance
Over-focusing on personal productivity can harm the collective, fostering heroic cultures. Roughly speaking, in such companies, 20% of individuals perform 80% of the work, which is harmful and should be combated, not encouraged by individual metrics.
Myth: A single productivity metric can tell us everything
It’s a misconception that a single, universal metric can evaluate teams across an organization or industry. Productivity encompasses several important dimensions and is heavily influenced by context. Comparing startups to banks, for example, is misleading.
Myth: Productivity measure are only useful for managers
Many developers believe that productivity metrics are useless due to their misuse by leaders. However, these metrics also benefit developers themselves by helping them organize and understand their priorities. Studies show that high productivity is associated with greater job satisfaction and happiness.
Myth: Productivity is just about systems and tools
Tools don’t capture invisible activities like mentoring and knowledge sharing that are essential to productivity. These invisible activities are just as crucial as the more common measures.
The framework proposes five dimensions for measurement, avoiding common problems and myths.
S: Satisfaction and Well-Being
This dimension assesses the overall happiness and mental health of team members. It includes factors like job satisfaction, work-life balance, and the perceived stress level within the team. A satisfied and healthy team is more likely to be productive and engaged, leading to better outcomes in the long run.
P: Performance
Performance metrics focus on the engineering team’s outcomes, such as the quality of the software they produce, the speed of delivery, and the overall impact of their work on the business. This dimension is often measured through traditional metrics like code quality, deployment frequency, and lead time for changes. However, the SPACE framework encourages looking beyond these to include measures of how well the software meets user needs and business goals.
A: Activity
This dimension tracks the day-to-day work that engineering teams do, such as coding, reviewing code, and fixing bugs. It includes metrics like the number of commits, pull requests, and code reviews completed. While these metrics are valuable, the SPACE framework emphasizes that activity should not be conflated with productivity. High activity levels do not necessarily mean that a team is productive or that their work is of high quality.
C: Communication and Collaboration
Effective communication and collaboration are critical for the success of engineering teams, especially in agile and DevOps environments. This dimension evaluates how well team members work together, share knowledge, and coordinate their efforts. Metrics in this area might include the number of collaborative tasks completed, the frequency of communication within the team, and the level of cross-functional collaboration.
E: Efficiency and Flow
Efficiency metrics measure how smoothly and effectively work progresses through the development pipeline. This dimension includes metrics like cycle time, which tracks the time it takes for work to move from one stage to the next, and flow efficiency, which measures the proportion of time spent on value-adding activities versus waiting or rework. The goal is to identify bottlenecks and areas where processes can be streamlined to improve overall efficiency.
The SPACE framework offers a more comprehensive approach to measuring engineering productivity, emphasizing the importance of balancing multiple aspects of team performance. By considering factors like satisfaction, communication, and efficiency alongside traditional performance metrics, organizations can gain a deeper understanding of how their teams are performing and where improvements can be made. This holistic approach ensures that teams are not only productive but also healthy, engaged, and capable of delivering high-quality software that meets user needs and business objectives.
Based on the limitations of SPACE and DORA, the same authors introduced DevEx (Developer Experience ) in a study published in January 2024. Its purpose is to establish a method that focuses on developers’ satisfaction with their work. DevEx is not technically anything new; it has always existed but has not been given the importance it deserves.
DevEx captures how developers feel, how they think, and how they value their work. In their original paper, the authors established 25 factors that affect (positively or negatively) the developer experience, some examples are interruptions, unrealistic deadlines, friction with tools, clarity in tasks, and code organization, among others. We tend to think that what affects the development experience is found in the tools, but it goes much further. Human factors, such as having clear objectives and feeling psychologically safe in a team, greatly impact performance.
Improving the developer experience not only impacts productivity but also clearly benefits satisfaction, engagement, and retention. The aspects that have a negative impact can have different levels of impact, from the company level, through to the team and individual. DevEx is different for every developer. It is necessary to know the context such as seniority, team and functions, past life experiences, etc. The DevEx approach goes to the level of specific people and companies (processes).
Feedback Loops
It is proven that organizations that optimize their value stream by analyzing value stream mapping and reducing waste in value delivery are more efficient. Make mistakes sooner to correct your course sooner. This fast loop allows developers to complete their work faster with minimal friction. Slow loops cause frustration, pauses, continuous task switching, dropping tasks, and having to return to them after a while.
To improve DevEx, these loops must be shortened in several ways:
Cognitive Load
An ever-increasing number of tools and technologies increases the cognitive load that developers face. This load covers the amount of mental processing that a developer needs to do to perform a task, such as very complex tasks or tasks that involve learning a new framework or paradigm. This load is also affected by how the information arrives to perform a task, from the language to the presentation of the information to the need to interpret the information to connect it with one’s knowledge.
Cognitive load hinders the most important role of a developer today: delivering value. When this cognitive load is high as a result of issues such as poorly documented code or systems, developers must spend additional time and effort to complete tasks and avoid errors. The key is to reduce this burden, to do so:
Here it is important to dedicate efforts to providing developers with all the necessary documentation, simplicity, and clarity in the tools and processes they perform daily. A dedicated DevEx team has to provide these tools.
Flow State
Flow state is about focus, that mental state that one enters when one performs an activity and is immersed in a feeling of concentration, involvement, and joy or enjoyment. Experiencing that feeling frequently as a developer favors productivity, innovation, and personal growth. Negative factors related to the first dimension feedback loops penalize this state of flow. Other aspects could be autonomy, clear team and project objectives, and stimulating and challenging tasks. This third dimension is about creating the right conditions for this state of flow, such as limiting interruptions, avoiding task and focus changes, and creating a safe space in the team that encourages the acquisition of challenges.
Metrics trigger behavior in a certain direction, whether you like it or not, and those behaviors end up permeating the culture of the team or company. What you measure reflects what you are going to focus on as a leader. They highlight what is important to you and, therefore, make the teams look in that direction.
The team will react to that metric with behavior since they know that they are being measured and, therefore, will behave accordingly to meet that metric. If you measure the number of lines of code, the team will produce more, which does not mean higher quality. Ultimately, these behaviors will end up permeating the culture of the team and, therefore, the company.
Following are some of the best practices to consider while selecting DevOps metrics:
Used well, DevOps metrics can elevate your team, your product, and your culture, but when used poorly, they can also destroy them. When choosing DevOps metrics, consider the type of leader you want to be for your team. Use metrics to know where to act and how to help/guide your team more in their day-to-day work, but don’t stop there, use them to help them grow.