DORA Metrics: An Infrastructure as Code Perspective

What are DORA Metrics

DORA metrics is a stream management framework that emerged as a cornerstone for evaluating developer productivity and software delivery performance. By focusing on four key metrics—deployment frequency, lead time for changes, time to restore service, and production change failure rate—the DORA framework provides actionable insights into how engineering teams deliver value and where improvements can drive efficiency and reliability.

Widely accepted as an industry benchmark, these metrics help improve organizational performance and align engineering efforts with business outcomes, ensuring that teams not only deliver faster but also maintain stability.

Critically, the effectiveness of DORA metrics hinges on an often-overlooked factor: the infrastructure supporting the application code. After all, without reliable and resilient infrastructure to run on, even the best application code cannot deliver value to the business.

To accelerate developer productivity and increase application stability, the underlying infrastructure must be self-service, consistent, automated, and resilient. Achieving these qualities requires adopting Infrastructure as Code (IaC) and leveraging a robust IaC management platform.

This post examines how IaC impacts DORA metrics, highlighting the potential to enhance both throughput and stability with the right tools and practices, such as those offered by env0.

DORA Metrics Defined

DORA metrics, developed by the DevOps Research and Assessment group, are widely recognized for measuring software delivery performance. The framework evaluates four key areas grouped under the categories ‘Throughput’ and ‘Stability’:

Throughput Metrics

Change Lead Time: This metric measures the time between committing a code change and merging it into production. Success in this area depends on seamless CI/CD pipelines, which require reliable infrastructure throughout the testing and deployment lifecycle.
Deployment Frequency: Deployment frequency tracks how often new changes are pushed to production. To increase deployment frequency without sacrificing quality, organizations must automate and standardize infrastructure provisioning, ensuring consistent environments.

Stability Metrics

Change Fail Percentage: This represents the proportion of deployments that fail in production. High failure rates often stem from inconsistent or misconfigured infrastructure, emphasizing the need for standardized IaC practices.
Mean Time to Recovery (MTTR): MTTR measures how quickly teams recover from deployment failures. Effective monitoring, coupled with automated rollback capabilities for both application and infrastructure changes, is critical to reducing recovery time.

These metrics serve as a performance benchmark for organizations aiming to improve performance and engineering teams' practices and deliver value efficiently.

Although DORA metrics typically focus on application code, they make an implicit assumption that infrastructure is always available, stable, and ready when needed. In reality, infrastructure plays a significant role in determining the success of application deployments, and its limitations can undermine DORA’s utility as a comprehensive measure of productivity.

For that reason, Infrastructure as Code and its management need to be included in any heuristic assessment of developer productivity.

Common Pitfalls

While DORA metrics provide valuable insights, they are not without limitations, nor are they the only way to measure performance. Other attempts to quantify productivity include lines of code committed, commit frequency, and pull request lead time. Each approach attempts to measure a larger phenomenon through a limited set of data points and is prone to skew and manipulation.

The ultimate goal of any productivity measure is to deliver business value, which DORA does not explicitly measure. The goal of DORA metrics is to correlate developer productivity with positive business outcomes. In much the same way, measuring the stability and resiliency of IaC is an attempt to correlate developer productivity with operational maturity. Despite the flawed nature of DORA metrics- and all other metrics, organizations continue to rely on them for actionable benchmarks, recognizing their potential to prioritize improvements in engineering teams' performance.

Infrastructure and Application Code Interaction

Infrastructure can be widely defined as the layer of technology that supports applications but is not directly part of the application code.

Infrastructure provides the foundation upon which applications can be deployed. Put simply, infrastructure is the layer that developers don't have to manage. However, it still has a large impact on throughput and stability once a developer's code leaves their local development environment.

Infrastructure and application code are intrinsically linked, with the former providing the foundation for successful deployments and improved deployment frequency. Developers require infrastructure that is scalable, self-service, stable, and secure to meet the demands of the modern software delivery process. Infrastructure as Code (IaC) addresses these requirements by enabling consistency, automation, and scalability across environments.

By codifying infrastructure definitions, IaC allows teams to provision environments that meet specific development and testing needs. This alignment ensures that infrastructure can support both the speed and stability demanded by DORA metrics, ultimately enhancing developer productivity and reducing risks.

IaC Impact on DORA Metrics

How does Infrastructure as Code specifically impact the four key DORA metrics? We can dig into each metric to see how it's defined and how IaC could positively or negatively impact it.

Throughput

Throughput as a category encompasses the speed with which a developer can write, commit, and deploy code successfully to production. It's broken up into the measures of Change Lead Time and Deployment Frequency.

Change Lead Time

Change lead time is expressed as the time it takes for a code commit to make its way into a production deployment. In a typical software development lifecycle, committing code to a repository kicks off a series of actions that are required before code can be promoted and packaged for production. This can include the following steps:

Code commit: when code is pushed to the repository
Code review and approval: the time spent reviewing and approving changes
Continuous integration (CI): the process of building and testing the change
Deployment: the process of packaging and deploying changes to production

Infrastructure plays a significant part in the CI and deployment phases. In a typical battery of CI tests, the code is deployed and validated in a testing environment.

Automated provisioning ensures that testing environments are ready when needed and configured correctly, eliminating delays and mistakes caused by manual setup. IaC also allows for multiple testing environments to be provisioned, so code changes can be tested in parallel.

During the deployment process, code artifacts typically undergo additional testing in lower environments before being promoted to production. The infrastructure for these lower environments- QA, staging, etc- should be consistently configured to match each other and production.

Standardized IaC configurations prevent discrepancies between environments, reducing debugging time and improving deployment confidence. Applications deployed to consistent environments are less likely to fail in unexpected ways due to infrastructure configuration issues.

Deployment Frequency

Deployment frequency is expressed by how often an organization successfully releases changes to production. A key philosophy in DORA metrics is the ability of teams to deliver small, incremental changes often, rather than large and infrequent deployments.

To successfully deploy changes frequently, organizations need to have in place pipelines that review code quality, run tests, and promote code through environments with the goal of continuous deployment. Once again, infrastructure heavily influences the ability of development teams to build a robust pipeline.

Levering IaC eliminates manual provisioning steps, enabling faster and more frequent code releases. It also delivers support for automated testing through consistent, IaC-defined environments that can be deployed on demand.

When it comes to enhancing Throughput, IaC can assist by being automated, self-service, consistent, and scalable.

Stability

The second grouping of DORA metrics examines the stability of applications that have been deployed into production. One might think that accelerating throughput would lead to reduced stability, but studies conducted by DORA have found the opposite to be true. Increased throughput tends to lead to greater stability by requiring the introduction of automated pipelines, thorough testing, and streamlined deployment processes.

Change Failure Rate

Change failure rate is a measure of how often production deployments fail. A key to reducing change failures is to catch issues earlier on in the development lifecycle.

IaC ensures that all environments share the same configuration catching potential issues before they reach production and reducing deployment failures caused by mismatched settings. Best practices like blue/green deployments and phased rollouts, facilitated by IaC, improve system reliability and can catch deployment issues before an outage is caused, thus directly impacting the change failure rate in production.

Mean Time to Recovery (MTTR)

Mean Time to Recovery is a measure of how long it takes on average to recover from a deployment failure. Robust and resilient infrastructure cannot help with bad application code, but it can assist with lowering the time it takes to recover from a failed rollout.

Monitoring tools integrated with IaC platforms can quickly identify failures and trigger automated recovery workflows. If the issue was caused by an infrastructure change, or the deployment required updating the infrastructure configuration, IaC is a vehicle to undo or remediate those issues.

IaC-driven infrastructure management also can contain the capability to automatically detect and resolve potential issues before they impact production. This could be as simple as automatically scaling components to deal with unforeseen application load, or as complex as detecting failures with the deployment region and automating a failover to another region.

These examples highlight how IaC directly influences the throughput and stability factors outlined by DORA metrics, underscoring the need for a comprehensive management platform to fully realize its benefits.

IaC Management with env0

Infrastructure as Code can have a tremendous impact on the stability and throughput of your developers, but only if it's managed properly. An IaC management platform simplifies and orchestrates the complexities of managing infrastructure at scale.

Env0 embodies this concept through five key pillars: self-service, governance, automation & orchestration, analytics & monitoring, and cloud asset management. These pillars collectively address the challenges of IaC adoption, ensuring infrastructure meets the needs of modern development teams.

Self-Service: Empowering developers to provision and manage infrastructure on-demand reduces lead times and improves productivity.
Governance: Policy enforcement ensures consistency and compliance, lowering the risk of failures and enabling cost control.
Automation & Orchestration: Streamlined workflows eliminate manual interventions, supporting faster deployments and greater scalability.
Analytics & Monitoring: Comprehensive metrics provide visibility into infrastructure performance, enabling proactive issue resolution and optimization.
Cloud Asset Management: Ensures that all cloud assets are managed through IaC and assists with assessing risk and detecting potential issues like configuration drift.

By addressing the throughput and stability challenges highlighted in DORA metrics, env0 not only improves development workflows but also delivers measurable business value, such as cost reduction and enhanced customer satisfaction. In the next section, we take a close look at each pillar and how it impacts and improves DORA metrics.

To provide some visual context on how these are related to DORA metrics, here is a rough idea of how they impact stability and throughput, and how they come into play depending on the organization's IaC 'maturity'.

For instance, the topic of Governance, which represents the balance point of stability and throughput, impacting both, is something that organizations start to consider mid-way into their IaC journey.

Enhancing DORA Metrics with env0

IaC Automation & Managed Self-Service

IaC Automation encompasses the necessary tooling and platform to automate and orchestrate the deployment and ongoing management of infrastructure through code. env0 supports automation workflows across multi-cloud environments using OpenTofu, Terraform, Pulumi, CloudFormation, Terragrunt, and Kubernetes. Pipelines built with env0 are flexible and customizable to support complex and unique workflows.

The customized deployment pipelines along with reusable templates and shared variables create a golden path developers can leverage when they need to create infrastructure. Developers can control the workflow through pull requests and branch merges, so they don't have to leave their native tools to make use of env0's self-service capabilities.

Throughput: Automated infrastructure updates and self-service capabilities shorten change lead times and enable faster testing cycles. Time-to-live (TTL) features ensure testing environments are available and automatically decommissioned, optimizing resource usage.
Workflows and Parallel Deployments: Orchestrated workflows and support for parallel deployments reduce bottlenecks, enabling faster code promotion to production.

Example: Role-based access managed at a team-level

‍

To learn more about env0's self-service capabilities and how they empower teams with managed IaC workflows, check out this guide: Mastering Managed IaC Self-Service with env0.

Governance

On an IaC Management platform, governance encompasses several aspects including:

The use of delegated access and roles empowers developers to manage their own infrastructure, reducing change lead time and increasing deployment frequency. At the same time, properly applied RBAC prevents unauthorized changes in critical environments reducing outages caused by misapplied updates.

Enforcing policy helps to ensure compliance with accepted best practices and well-architected designs. This has the impact of reducing production change failure rate, as compliant infrastructure is less likely to fail in well-known ways.

Watch this video to learn how env0 leverages runtime policies to enhance governance, ensure compliance, and enable secure Terraform deployments:

Infrastructure drift is a situation where the actual resources differ from the declared configuration. Drift is often a source of inconsistency between environments, and can lead to failed deployments where production has drifted from what was tested in the lower environments. env0's drift detection helps to surface drift when it occurs and assists in remediating that drift before it causes outages or failed deployments, enhancing the stability metrics.

Watch this tutorial to see how env0 enables smart drift detection and auto-remediation:

When an outage occurs, logging is critical to determine the cause and apply a remediation. env0 offers extensive logging and auditing capabilities to reduce mean time to resolution and identify root causes to prevent future outages from occurring.

Although cost control doesn't necessarily improve DORA metrics, it does provide real business value by reducing operational expenditures. This is an area where the DORA metrics fail to paint a complete picture.

Analytics & Monitoring

While auditing and logging do play a critical role in governance, they are also part of the larger analytics and monitoring features in env0.

A key part of the DevOps teams' cycle is to provide feedback to the developers to help them enhance their code. env0 integrates with popular observability tools like Splunk and Datadog to build a holistic view of your software development lifecycle from first deployment to steady-state operation.

The metrics gathered from testing and production environments can help identify bottlenecks and inform optimization efforts. Solid monitoring and proper analysis can both increase throughput and enhance stability, as long as the right information is available to the development teams. env0 assists in surfacing that information through tools your development team is already utilizing.

Just as important as gathering metrics and logs is monitoring the environment for possible issues and outages. While env0 does not replace a traditional monitoring solution, it does have insight into the status of infrastructure deployment and can alert when deployments fail or drift is detected. Both of these factors can enhance stability by reducing MTTR and change failure rate.

env0 dashboard offers insights into users, deployments, environments, drifts, etc.

Cloud Asset Management

Cloud Compass, env0’s Cloud Asset Management capability, aims to bridge the gap between manual and automated cloud operations. It scans your current cloud accounts to identify infrastructure that is not currently being managed by IaC. Not only does it identify unmanaged resources, but it also allows you to take action by automatically generating IaC and importing those resources into env0.

If you've inherited existing infrastructure from other teams or through a merger, there's a good chance that it hasn't been deployed with infrastructure as code and likely isn't in compliance with your current policies. Cloud Compass can drastically shorten the time it takes to onboard new environments and apply consistent controls, providing all the DORA metric benefits that come from proper IaC Management.

Additionally, IaC tools can only detect drift in the resources under management. If new, out-of-band resources are introduced through Click-Ops or CLI tools, traditional IaC management is unaware of them. These unknown resources can be the cause of outages and failed deployments, slowing down troubleshooting efforts and presenting an incomplete picture of reality. Cloud Compass discovers these unmanaged resources and can assign risk levels to help operations teams prioritize action.

To learn more about Cloud Compass and how it identifies unmanaged resources, check out this video:

Conclusion

Infrastructure is the backbone of modern application delivery, playing a pivotal role in achieving the continuous improvement goals outlined by DORA metrics. Driving improvement across these four metrics is not only essential for enhancing engineering performance but also a key component of effective DevOps stream management, enabling the development of high-performing teams, empowerment of elite performers, etc.

By automating and managing infrastructure with IaC, organizations can address the challenges of throughput and stability, enabling more frequent, reliable, and efficient deployments.

Platforms like env0 elevate the benefits of IaC by offering tools for self-service, governance, and analytics that optimize infrastructure workflows. For teams aiming to improve developer productivity and deliver measurable business value, env0 serves as an indispensable partner in achieving these objectives.

To learn how env0 can help your team improve DORA metrics, schedule a personalized demo today.

in this post

This is some text inside of a div block.

Heading