Understanding DevOps Metrics: DORA Metrics, SPACE Framework and DevEx

Umair Khurshid
Share this :
Share this :

Introduction

Measuring and improving the productivity of DevOps teams has long been a major challenge. From purely technical metrics to more business-oriented ones, the choice of metrics often remains a concern, especially since it can end with completely obsolete choices. In the era of the generalization of DevOps practices, standards are being developed for almost every practice, and the same is true of DevOps metrics.

Usually, the productivity of a DevOps team is considered a measure of its ability to efficiently write and deploy high-quality software that works well and is easy to maintain. From a business perspective, companies want to more accurately measure the productivity of their development staff for such reasons as:

  • Monitoring progress over time
  • Creation of benchmarks
  • Rewarding top performers
  • Determining resource allocation
  • Identification of more productive development processes

Essentially, companies are always looking for binding metrics that help them identify employee behaviors and ultimately motivate employees to improve performance. At the same time, companies also want metrics that can justify the investments made.

Traditional Metrics for Measuring DevOps Productivity

At first glance, measuring DevOps productivity seems like a simple task. Unless, as we can see, you want to do it well. In that case, it is very difficult. To illustrate this, let’s look at some of the traditional metrics used by many companies to measure developer productivity. These include, among others:

  • Lines of Code (LoC)
  • Number of commits
  • Pull requests
  • Code reviews
  • Number of builds/tests
  • Number of deployments
  • Predictability of task completion
  • Number of defects

Myths and Misunderstandings

It is a common misconception that there is a simple and effective way to measure the productivity of a DevOps team with a very good metric. In reality, no single metric can be used to evaluate productivity, as it depends on far too many factors that can vary across industries, organizations, and specific cases. In practice, a weighted combination of several metrics may be used, but even then the metrics and specific weightings will vary from case to case.

Also, the claim that productivity is primarily determined by the individual is a myth. Of course, individual performance has great value in its own right, but ultimately the contribution to the team as a group-based activity is crucial. It is well established that project success does not only depend on the independent success of individuals. It also depends on how well the DevOps team works together. In fact, the whole is more than the sum of its parts.

Metrics vs Frameworks

We must differentiate between specific metrics and frameworks. Metrics are nothing more than concepts that we measure, which, without context, are meaningless numbers. A framework is a guide to how a series of metrics, which you do not have to apply to all, can be established and used in a certain context to understand how a team is doing and what actions to take to get to the point you want.

One of the most common mistakes is to take a framework and apply all the metrics. A framework is a simple summary of ideas, concepts, and relationships so that based on your context, environment, and teams, you can select a set of metrics that you can interpret to determine whether the team is where it should be.

Next, we will examine three of the most popular frameworks for measuring the productivity of a DevOps team.

DORA Metrics

DORA is a set of metrics that aims to measure the performance of software engineering teams in agile and DevOps development environments. In short, this approach is based on more effective data and can objectively assess the performance of DevOps teams.

The metrics emerged from the work of DORA (DevOps Research and Assessment), an organization founded in 2014 by academic researchers Nicole Forsgren, Jez Humble, and Gene Kim. In 2018, Google, acquired the organization and incorporated its research and metrics into its Cloud platform.

The institution’s goal is to conduct research and evaluations in software development companies to identify the most effective practices and key metrics for evaluating team performance.

The “State of DevOps Report,” a result of DORA’s research, is widely used worldwide and has become a reference in the industry. With more than 30,000 participants since its creation, it offers valuable insights to improve software development and operations processes.

The DORA survey evaluates the performance of engineering teams in software development using four main metrics, divided into speed and stability.

While these metrics are useful, they should not be treated as the main goal. The focus should always be on building quality software – efficiently and usefully for users – in addition to generating value for the business, of course. That said, let’s delve deeper into the metrics that will support the entire journey of building digital products.

Speed Metrics

Deployment Frequency

This measures how often a team deploys code to production. A high deployment frequency can indicate that the team can deliver value to users continuously and quickly, responding to market demands and adjusting the software according to user needs. DORA research classifies Deployment Frequency into three categories:

  • Low Performance: teams that deploy between once per semester and once per month. These teams may face challenges in the delivery process, such as inadequate integration, lack of automation, or inefficiencies in communication between development and operations.
  • Medium Performance: teams that deploy code between 1x per month and 1x per week, which means they are in an intermediate stage and can further improve their processes and practices to increase delivery speed.
  • High Performance: teams that perform deployments daily or even multiple times per day. These teams generally have well-established processes and practices, such as test automation and continuous deployment, which allow them to deliver new features and fixes quickly and efficiently.

Lead Time for Change

This measures the time it takes from the moment a code change is made until it is successfully deployed to production. A shorter lead time may indicate that the team can deliver new features or fixes quickly and efficiently, responding quickly to market changes and user needs.

The DORA survey also classifies Lead Time for Changes into three categories:

  • Low Performance: teams with a lead time between one and six months. These teams may face issues in the delivery process, such as time-consuming manual testing, lack of automation, and inadequate communication between development and operations teams.
  • Medium Performance: teams with lead times between one week and one month, which means they are in an intermediate stage and can benefit from improvements in processes and practices to further accelerate the delivery of changes.
  • High Performance: teams with a lead time of less than one day. These teams generally have well-established processes and practices, such as continuous integration and continuous delivery, which allow them to adapt quickly and effectively to changes.

Stability Metrics

Change Failure Rate

This metric measures the proportion of deployments that result in failures, such as incidents in production or the need to roll back changes. A lower failure rate may indicate that the team can manage risks and maintain software quality while deploying changes, which is essential for a truly efficient squad.

The DORA survey classifies Change Failure Rate into three categories:

  • Poor Performance: teams with a change failure rate between 46% and 60%. These teams may experience quality and reliability issues in the delivery process, which can result in a higher proportion of failed deployments.
  • Average Performance: teams with a change failure rate between 15% and 45%. These are in an intermediate stage and can benefit from improvements in processes and practices to further reduce the change failure rate.
  • High Performance: teams with a change failure rate between 0% and 15%. These teams generally have well-established processes and practices, such as automated testing and continuous monitoring, which allow them to manage risks and maintain software quality and reliability.

Mean Time To Restore

This metric helps assess team’s ability to handle incidents and failures. It measures the average time it takes to restore service after a failure, indicating how effectively the team can identify, diagnose, and resolve issues. 

DORA research classifies Mean Time to Restore into three categories:

  • Low Performance: teams with an MTTR between one week and one month. These teams may face challenges in resolving issues and incidents, such as a lack of adequate monitoring and diagnostic tools, or inefficient processes for managing incidents.
  • Medium Performance: teams with an MTTR between one day and one week. These teams are in an intermediate stage and can benefit from improvements in processes and practices to further accelerate service restoration after failures.
  • High Performance: teams with an MTTR of less than one day. They generally have well-established processes and practices, such as continuous monitoring and effective collaboration between development and operations, which allows them to identify and resolve issues quickly.

In short, DORA Metrics are unique because they are not limited to just delivery speed, but also encompass the operational stability of the software over time. This ensures that the quality of the software is not compromised after initial delivery. Furthermore, it allows the performance of the software to be closely monitored in terms of stability, reliability, and other quality metrics. 

As for validity, the metrics make sense, but they do not cover all aspects of productivity. Many aspects are left out, such as developer satisfaction and related indicators. The authors of the original study themselves mentioned in some interviews that these metrics are incomplete and could be expanded. This led to the creation of SPACE.

SPACE Framework

The SPACE framework is a set of metrics designed to provide a holistic view of software engineering team performance. Unlike traditional approaches that may focus solely on speed or output, the SPACE framework emphasizes multiple dimensions to capture the overall productivity and well-being of engineering teams. Developed by researchers and practitioners in the field, it aims to address the complexities of modern software development environments, especially within agile and DevOps practices.

The main idea is that assessing productivity goes beyond measuring a single dimension, and there are several associated myths. Below, I will briefly discuss some of them:

Myth: Productivity is only about developer productivity

Large workloads can result from a variety of factors, such as excessive hours due to inefficient systems or inadequate planning. Activity metrics alone cannot reward or penalize because they lack context. Simple metrics like commits or code reviews can also be error-prone and do not account for activities like pair programming. Additionally, overtime due to tight deadlines due to poor planning or a detrimental culture can compromise productivity assessment.

Myth: Productivity is only about individual performance

Over-focusing on personal productivity can harm the collective, fostering heroic cultures. Roughly speaking, in such companies, 20% of individuals perform 80% of the work, which is harmful and should be combated, not encouraged by individual metrics.

Myth: A single productivity metric can tell us everything

It’s a misconception that a single, universal metric can evaluate teams across an organization or industry. Productivity encompasses several important dimensions and is heavily influenced by context. Comparing startups to banks, for example, is misleading.

Myth: Productivity measure are only useful for managers

Many developers believe that productivity metrics are useless due to their misuse by leaders. However, these metrics also benefit developers themselves by helping them organize and understand their priorities. Studies show that high productivity is associated with greater job satisfaction and happiness.

Myth: Productivity is just about systems and tools

Tools don’t capture invisible activities like mentoring and knowledge sharing that are essential to productivity. These invisible activities are just as crucial as the more common measures.

Five Dimensions of the SPACE Framework

The framework proposes five dimensions for measurement, avoiding common problems and myths.

S: Satisfaction and Well-Being

This dimension assesses the overall happiness and mental health of team members. It includes factors like job satisfaction, work-life balance, and the perceived stress level within the team. A satisfied and healthy team is more likely to be productive and engaged, leading to better outcomes in the long run.

P: Performance

Performance metrics focus on the engineering team’s outcomes, such as the quality of the software they produce, the speed of delivery, and the overall impact of their work on the business. This dimension is often measured through traditional metrics like code quality, deployment frequency, and lead time for changes. However, the SPACE framework encourages looking beyond these to include measures of how well the software meets user needs and business goals.

A: Activity

This dimension tracks the day-to-day work that engineering teams do, such as coding, reviewing code, and fixing bugs. It includes metrics like the number of commits, pull requests, and code reviews completed. While these metrics are valuable, the SPACE framework emphasizes that activity should not be conflated with productivity. High activity levels do not necessarily mean that a team is productive or that their work is of high quality.

C: Communication and Collaboration

Effective communication and collaboration are critical for the success of engineering teams, especially in agile and DevOps environments. This dimension evaluates how well team members work together, share knowledge, and coordinate their efforts. Metrics in this area might include the number of collaborative tasks completed, the frequency of communication within the team, and the level of cross-functional collaboration.

E: Efficiency and Flow

Efficiency metrics measure how smoothly and effectively work progresses through the development pipeline. This dimension includes metrics like cycle time, which tracks the time it takes for work to move from one stage to the next, and flow efficiency, which measures the proportion of time spent on value-adding activities versus waiting or rework. The goal is to identify bottlenecks and areas where processes can be streamlined to improve overall efficiency.

The SPACE framework offers a more comprehensive approach to measuring engineering productivity, emphasizing the importance of balancing multiple aspects of team performance. By considering factors like satisfaction, communication, and efficiency alongside traditional performance metrics, organizations can gain a deeper understanding of how their teams are performing and where improvements can be made. This holistic approach ensures that teams are not only productive but also healthy, engaged, and capable of delivering high-quality software that meets user needs and business objectives.

DevEx Framework: Measuring and Enhancing Developer Experience

Based on the limitations of SPACE and DORA, the same authors introduced DevEx (Developer Experience ) in a study published in January 2024. Its purpose is to establish a method that focuses on developers’ satisfaction with their work. DevEx is not technically anything new; it has always existed but has not been given the importance it deserves.

What is DevEx

DevEx captures how developers feel, how they think, and how they value their work. In their original paper, the authors established 25 factors that affect (positively or negatively) the developer experience, some examples are interruptions, unrealistic deadlines, friction with tools, clarity in tasks, and code organization, among others. We tend to think that what affects the development experience is found in the tools, but it goes much further. Human factors, such as having clear objectives and feeling psychologically safe in a team, greatly impact performance.

Improving the developer experience not only impacts productivity but also clearly benefits satisfaction, engagement, and retention. The aspects that have a negative impact can have different levels of impact, from the company level, through to the team and individual. DevEx is different for every developer. It is necessary to know the context such as seniority, team and functions, past life experiences, etc. The DevEx approach goes to the level of specific people and companies (processes).

Feedback Loops

It is proven that organizations that optimize their value stream by analyzing value stream mapping and reducing waste in value delivery are more efficient. Make mistakes sooner to correct your course sooner. This fast loop allows developers to complete their work faster with minimal friction. Slow loops cause frustration, pauses, continuous task switching, dropping tasks, and having to return to them after a while.

To improve DevEx, these loops must be shortened in several ways:

  • Identify delays in tools and compilation or testing times
  • Identify process losses using a Lean approach.
  • Identify problems in organizational structures to promote team interactions.

Cognitive Load

An ever-increasing number of tools and technologies increases the cognitive load that developers face. This load covers the amount of mental processing that a developer needs to do to perform a task, such as very complex tasks or tasks that involve learning a new framework or paradigm. This load is also affected by how the information arrives to perform a task, from the language to the presentation of the information to the need to interpret the information to connect it with one’s knowledge.

Cognitive load hinders the most important role of a developer today: delivering value. When this cognitive load is high as a result of issues such as poorly documented code or systems, developers must spend additional time and effort to complete tasks and avoid errors. The key is to reduce this burden, to do so:

  • Clean code
  • Organized documentation and well-documented code

Here it is important to dedicate efforts to providing developers with all the necessary documentation, simplicity, and clarity in the tools and processes they perform daily. A dedicated DevEx team has to provide these tools.

Flow State

Flow state is about focus, that mental state that one enters when one performs an activity and is immersed in a feeling of concentration, involvement, and joy or enjoyment. Experiencing that feeling frequently as a developer favors productivity, innovation, and personal growth. Negative factors related to the first dimension feedback loops penalize this state of flow. Other aspects could be autonomy, clear team and project objectives, and stimulating and challenging tasks. This third dimension is about creating the right conditions for this state of flow, such as limiting interruptions, avoiding task and focus changes, and creating a safe space in the team that encourages the acquisition of challenges.

Which Metrics Framework to Choose?

Metrics trigger behavior in a certain direction, whether you like it or not, and those behaviors end up permeating the culture of the team or company. What you measure reflects what you are going to focus on as a leader. They highlight what is important to you and, therefore, make the teams look in that direction.

The team will react to that metric with behavior since they know that they are being measured and, therefore, will behave accordingly to meet that metric. If you measure the number of lines of code, the team will produce more, which does not mean higher quality. Ultimately, these behaviors will end up permeating the culture of the team and, therefore, the company.

Following are some of the best practices to consider while selecting DevOps metrics:

  • Don’t focus on just one category. Measure effectiveness and efficiency.
  • Balance metrics to avoid wrong incentives.
  • Be fair with technical complexity, don’t try to wonder why one development is taking longer than another. It doesn’t make sense. Each development is a world of its own.
  • The same metric can have many interpretations, for example, retention is often used to measure developer satisfaction, but it can be much more than that, it can be a reflection of low salary bands or the possibility of a career within the company.
  • Start easy, with easy-to-measure metrics that have a big impact. You can focus on the complex ones later.
  • Transparency to the team, the team must know what it is being measured on.
  • Focus on context as metrics without context are useless.

Conclusion

Used well, DevOps metrics can elevate your team, your product, and your culture, but when used poorly, they can also destroy them. When choosing DevOps metrics, consider the type of leader you want to be for your team. Use metrics to know where to act and how to help/guide your team more in their day-to-day work, but don’t stop there, use them to help them grow.

Written By
Umair Travis CI Writer
Umair Khurshid
Umair is a developer who is passionate about everything DevOps, home labs, and sysadmin. When he's not working, you'll find him watching random cat and monkey videos,
Reviewed By
Stan Jaromin of Travis CI
Stan Jaromin
Stan Jaromin is a Product Manager with Travis CI. Stan drives the product roadmap and manages the entire development process. Stan thrives on collaboration, working closely with engineers, designers, and customers to ensure the creation of user-centric products. Stan's experience translates to a deep understanding of the entire product lifecycle.
© Copyright 2024, All Rights Reserved
© Copyright 2024, All Rights Reserved
© Copyright 2024, All Rights Reserved