What is Infrastructure-as-Code
Infrastructure-as-Code (IaC) is a method of automating the management and provisioning of infrastructure resources. Instead of manually clicking buttons on a web console, IaC enables organizations to describe their system architecture using code, allowing them to store, version, and track changes to their systems and application infrastructure.
The goal is to automate the process of setting up, configuring, deploying, and managing applications. IaC is a powerful technology that allows you to provision and manage any cloud resource in an automated, declarative way. Infrastructure-as-Code is now the de facto standard for new projects and the focus of many organizations is now migrating from legacy architecture to IaC.
Before Infrastructure-as-Code: Pre-IaC Architecture
IaC’s major transformation was that developers could now create a consistent, repeatable workflow, bringing about wider-scale deployments across a range of resources, environments, and locations.
Delving a bit deeper, how did it achieve this? IaC provisions infrastructure and application resources through machine-readable definition files instead of through physical hardware configuration or interactive configuration tools.
Before, infrastructure management was a costly, manual process that hindered scale and availability. There was extreme variability in infrastructure largely due to manual configuration. Manual processes were more error-prone and could not be scaled, much less standardized. Remote access tools slowly entered the market, but system administrators (sysadmins) still had to provision new hardware and resources manually by connecting to remote cloud providers via APIs.
Environment drift: When infrastructure for an application's software development process – development, staging, and production environments falls out of sync. Environment drift, or configuration drift, causes inefficiencies and can be expensive in direct cost and potential user experience impacts. If your app’s development environment varies from the production environment, this can lead to failure in production or bugs, and even prevent recovery in the event of disaster.
Automation changed that, reducing the problem of forgotten tasks, automating configuration drift detection, and allowing other features to automatically manage infrastructure problems or remedy issues. Among those revolutionary features were version control systems (VCS), configuration management tools, and orchestration capabilities.
Infrastructure-as-Code Benefits
Now, IaC has made IT more efficient than ever before, solving numerous IT challenges and enabling new capabilities such as:
Recreating environments
It used to be challenging to recreate an identical environment after deployment because the systems it interacted with also had to be updated.
With Infrastructure-as-Code, users can recreate infrastructure from scratch, and on-demand, simply by replaying code. The pipeline uses a prescribed set of parameters for deployment and creates a new environment that is identical in terms of the number of hosts, networks, data centers, clusters, data stores, etc., every time that it runs. The infrastructure code can even be versioned with the product, making it easy for engineers to recreate the infrastructure as it was when a previous version of the product was released.
Minimizing errors
IaC minimizes the need for manual infrastructure management, reducing the risk of human error. Rather than depending on engineers to remember past configurations or respond to failures, everything is in the code, under your source control system.
When changes go to production, the infrastructure code is checked in a code review or in a review by a gatekeeper.
Supporting teamwork and collaboration
Using IaC, engineers don’t have to deal with problems caused by conflicting changes in a shared environment. Infrastructure-as-Code makes it easier to work as a team and to share code with colleagues and other teams, so they can utilize it to set up their own environments. Using a VCS, different teams can each work on a separate piece of the infrastructure, rolling out their changes in a controlled manner.
Reducing cloud expenditure
The shift from bare metal infrastructure investments to the cloud reduced CapEx, and IaC has reduced them even further by enabling auto-scaling capabilities. With IaC, a software developer writes code and configuration management instructions that trigger actions according to actual needs and accurately reflects the structure of the real operating environment. Infrastructure-as-Code lets you manage your environments easily and automatically deactivates environments you no longer need.
DevOps and Infrastructure-as-Code
DevOps emphasizes automating manual tasks that typically take up a lot of software developers’ and IT operators’ time. IaC is one of the key technical practices that enable DevOps within an organization, by automating the provisioning and management of IT infrastructure. With IaC, developers can self-serve the provisioning of environments, saving time for them and the operations team.
How Infrastructure-as-Code Works
Key Concepts
- GitOps – This involves integrations between your IaC tech stack and the infrastructure itself via your Git repository (on GitHub, GitLab, Bitbucket, etc.). This includes streamlining changes as much as possible, such as embedded PR commands.
- Version Control – This is related to GitOps, where you will want to have a firm grasp on what versions of a framework, module, provider, or code you are using for your current work or for a specific kind of deployment.
- State Management – This refers to the storage and maintenance of your desired state. Some IaC tools do not encrypt state files by default. For example, Terraform does not encrypt (it’s a premium feature in Terraform Cloud) while OpenTofu does.
- Registry – A registry is a marketplace for finding add-ons, integrations, packages, and policies. It often refers to the Terraform Registry.
- Templates – Templates refer to reusable packages of code or files that provision resources in certain configurations. They should be git-based.
- Modules – This is the term for a configuration package, or collection of config files, in Terraform.
- Providers – This is the term for an integration mechanism, akin to an API, between Terraform and a third-party app.
- FinOps – This refers to the automation of cost monitoring, spending projections (cost estimation), and budget notifications/alerts so users can track the expense of their cloud deployments (in IaC and other sectors of DevOps).
- IaC Pipelines – This is an ordered sequence of common or repetitive tasks that is configured to run automatically so as to save teams time with projects.
- IaC Workflows – This refers to the sequence of status changes of infrastructure within a pipeline.
Declarative vs. Imperative Approach for Infrastructure Configuration
As with other subjects in DevOps, infrastructure has declarative and imperative approaches. Think of it like a means to an end; or rather, the imperative approach defines the means and the declarative approach defines the end.
The imperative approach focuses on the sequence of commands needed to reach the desired state of your application, specifically in this case your infrastructure. In contrast, the declarative approach is becoming more popular thanks to better automation tools, as devs can define the endgame state and a given tool will configure an environment to reach that stated goal.
Chef is the most prominent tool relying on imperative programming for IaC. Some have a mix of imperative and declarative implementations, namely Pulumi, Salt, and Ansible. However, declarative is gaining traction and effectiveness thanks to advances in automation. Declarative IaC tools include OpenTofu, Terraform, AWS CloudFormation, and Puppet.
Challenges and Best Practices
Many best practices for IaC overlap with DevOps best practices in general. However, there are caveats specific to maintaining code-based infrastructure.
Idempotency
Yeah, read that word carefully. This refers to being able to reapply code multiple times while getting a consistent result every time. This is as much a principle as it is a requirement to automate infrastructure, and templating will reduce or outright eliminate errors in many use cases. The goal of consistency also relates to testing, making sure that a deployment works in multiple environments and avoids the ‘it works on my machine’ problem.
CI/CD & Testing
Many teams have not instilled continuous integration and continuous deployment into their infrastructure deployments. CI/CD should be standardized in all layers of development and operations, including IaC. Constant changes to infra require testing and full VCS integration.
Observability – Logs & Debugging
Depending on the kind of deployment, you should have logging configured across your entire tech stack. Additionally, consider metrics and tracing to monitor every level of your infrastructure. Finally, debugging should be standard protocol with any code changes, especially if you’re changing code within a resource instead of switching out resources.
Immutability (when applicable)
Immutability refers to making code unchangeable. In such cases, changes mean replacing a resource entirely rather than editing its internal code. This is not always practical, but when it is, it eliminates an area prone to frustrating errors.
Version Control (including environmental parity)
As mentioned with CI/CD, VCS can protect you from influencing the wrong environment or pushing changes that aren’t applicable in some versions of your Infrastructure-as-Code framework. This is even more essential when dealing with multi-framework deployments, which get confusing.
Cost Management/FinOps
Cost management and cost projection/prediction are getting better with newer tooling available to all classes of developers, and the same with IaC FinOps for system architects. Tracking cloud spending gets tricky, especially with the long list of internal features that cloud providers like AWS or Azure offer.
State Management
Storing the state of your IaC framework is fundamental. With many tools moving toward declarative programming, keeping that well-defined state protected is crucial.
Modularization
Relating to templates and paralleling containers, IaC frameworks like Terraform and OpenTofu rely on modules to organize resources defined by configuration files in the same directory. In the case of Terraform, they will be .tf or .tfjson files. There are three primary reasons behind using a Terraform module: 1) packaging resources together that will be used together in a reusable configuration, 2) sharing standardized configurations across organizations, and 3) don’t-repeat-yourself programming (DRY).
Access (Roles and Users)
This is part of the security concerns of an IaC setup. You want to manage and allow access to as many people in your organization as possible, but make sure that levels of access are well-defined in specific roles. This makes RBAC, role-based access control, as essential in IaC as any other sector of DevOps.
Watch out for these IaC Pitfalls...
While IaC has clear advantages, it also presents unique challenges that usually emerge as you scale.
1. Integration with management tools
To harness the full benefits of IaC, it must be integrated into all processes, including CI/CD workflows, notification tools like Slack, security tools, system administration, IT operations teams, and DevOps teams, with well-documented policies and procedures. Without full integration, errors can quickly spread across the system.
2. Longer turnaround
When using IaC, every change has to be coded, tested, and reviewed before it is applied. Changes are more complex and must be planned carefully to avoid significant downtime. Learn more: Video: Top IaC Challenges
3. Lack of cloud expense oversight
Since IaC deploys infrastructure components automatically, it can be hard to keep track of expenses. Development teams are often unaware of the financial ramifications of their code, and expenses can build up quickly without monitoring tools that are designed for IaC.
That’s why some would explicitly include FinOps in the rubric of IaC. Regardless, it’s an essential part of managing complex infrastructure. For instance, env0 includes cloud cost monitoring and optimization in its feature set.
IaC Toolchain Sprawl
One of the primary benefits of adopting Infrastructure-as-Code is consistency, which is only possible if teams across your organization are using different IaC tools and approaches. In many cases, implementing IaC requires a cultural shift in addition to the technical one to ensure success. The advantages far outweigh any overhead associated with implementing and managing IaC.
We’ll try to make some sense of that tool sprawl with the following section, covering the major frameworks and associated platforms in the world of infra.
Infrastructure-as-Code Frameworks
IaC’s major tools are frameworks that incorporate multiple functions into a single platform. The list below starts with those assets and then continues with IaC tools that are popular for one or multiple functions within IaC tech stacks. The following Venn diagram shows what kind of features go into a complete IaC framework, but note its complex structure that shows some tools can cover much of what you need for a deployment, but not everything.
Terraform & OpenTofu
Terraform is an IaC tool created and maintained by HashiCorp; it is currently the most widely used Infrastructure-as-Code tool in the industry. It is widely credited with creating common best practices including arguably the use of declarative programming.
In Summer 2023, Terraform moved away from Open Source licensing. As a response, several companies (including env0) collaborated to create an open-source, alternative known as OpenTofu. OpenTofu is currently managed by the Linux Foundation. Its initial release, v1.6.alpha, seeks to be a drop-in replacement for the Terraform version of the same number.
Terragrunt
Terragrunt is a thin wrapper for Terraform that provides additional tools for deploying hooks, managing dependencies, remote states and multiple environments, as well as keeping your Terraform configuration files DRY (Don't Repeat Yourself). Terragrunt is open-source and a popular choice for Terraform users looking for ways to keep their codebase efficient, clean and well-organized.
AWS CloudFormation
CloudFormation is the AWS service for IaC. It uses JSON or YAML to define resources. Its added advantage is that it works seamlessly with other AWS tools. On the flip side, its main disadvantage is that it only handles AWS infrastructure resources. Additionally, it limits templates to only 500 resources apiece, arbitrarily still keeps some processes manual, and has confusing documentation.
Pulumi
Pulumi is an open-source IaC framework that uses common programming languages to configure and provision resources rather than a domain-specific language like HCL. That also allows it to take advantage of inherent features of languages like Python, JavaScript, C#, and Go among others, as well as various implementations of those languages like TypeScript, Node.js, .NET, etc.
Like Terraform and OpenTofu, Pulumi supports major cloud providers - AWS, Azure, and GCP cloud providers. It also features its own state management and language hosting, plus a command-line interface (CLI).
Crossplane
Crossplane is an open-source IaC framework managed by the Cloud Native Computing Foundation (CNCF) with a specific focus on managing Kubernetes infrastructure. It keeps application and infrastructure configuration in the same control plane (Kubernetes application layer), and uses other common k8s tools like Helm or Kustomize to launch IaC templates.
Atlantis
Atlantis is a GitOps-focused tool that often acts as an add-on to basic IaC frameworks. It applies infrastructure automation with Terraform actions by use of commands embedded in pull requests (PRs) and to work from within their VCS. It still uses the webhooks native to Terraform to manage this, trying to get more done in Terraform by working through comments and PRs from GitHub, GitLab, and other version control systems.
CI/CD & Configuration Tools Used for IaC
Ansible
Ansible is an open-source CI/CD application that applies automation to pipelines but also functions as a configuration manager and orchestration tool. It is often compared with Jenkins, though the two tools can also function together in certain environments. In addition, Ansible integrates with Terraform. It is written in Python and works from the command line/terminal.
Argo CD
Argo CD is an open-source continuous delivery tool focused on Kubernetes that uses declarative programming. It monitors activity in Kubernetes clusters and compares infrastructure there to the version stored in a specified git repository. It will resolve any differences between the two versions to maintain the desired state. ArgoCD is commonly used in conjunction with IaC tools for managing and orchestrating applications alongside infrastructure.
Jenkins
Jenkins is mainly an open-source continuous integration tool. It automates testing, packaging, building, and deployment. It is more broadly considered a CI/CD tool, as it also handles continuous delivery. It supports several VCSs from the most popular to more niche options: GitHub, GitLab, Bitbucket, Git, Mercurial, Subversion, etc. Many developers use Jenkins to deploy infrastructure components, but it has limitations relative to fully IaC-dedicated frameworks. It can run multiple jobs through multi-threading.
CircleCI
CircleCI is, despite the limiting name, a full CI/CD tool for automating builds, testing, and deployments. Through its integration with a VCS, any change in a repository will trigger a CircleCI run job and run jobs simultaneously through parallelism/parallel processing (in contrast to Jenkins’ multi-threaded approach).
SaltStack
SaltStack, also known as the Salt Project or simply Salt, mainly serves as an orchestration and configuration tool. It has an emphasis on automating repeated DRY tasks. It uses the push method to make changes to code.
Chef
Chef is usually defined as a configuration management tool, which automates – writes, tests, and deploys – code. It can also be defined broadly as an infrastructure-as-code framework and automation platform. Its DSL is based on Ruby. To draw an analogy with Terraform’s modules, Chef’s “cookbooks” package together multiple “recipes,” e.g. config files that cover which resources to manage and in what order to execute them. As mentioned above, Chef relies mainly on imperative programming. Its client-side server architecture is known to support popular operating systems like Ubuntu and Windows.
Puppet
Puppet is a configuration management tool for automating code; it is often directly compared with Chef. It can also be defined broadly as an IaC framework with uses for orchestration, CI/CD, and monitoring. It mainly supports declarative programming. It supports different implementations of Linux in addition to other operating systems (MacOS, Windows, Ubuntu, Debian, etc.). It relies more on the pull method to make changes.
Infrastructure Management at Scale with env0
env0 is a self-service automation platform and management layer that sits above an IaC framework. It provides a simplified user interface for administering environment templates, controlling access roles, managing variables, defining policies, overseeing FinOps mech anisms, setting parameters for different developer environments (including ephemeral), and more.
All in all, env0’s product reflects what the company sees as best practices for Infrastructure-as-Code, and therefore offers a suite of services:
Infrastructure Automation
env0 extends the creation of pipelines and workflows to Infrastructure-as-Code, using what are now established best practices in other segments of DevOps. env0 integrates with tools from different parts of the IaC tech stack – version control systems, configuration managers, orchestration tools, and CI/CD platforms – to create a consistent workflow with persistent changes pushed/pulled to your infrastructure.
Self-Service & Visibility
The emphasis on self-service leads to an emphasis on ‘granular RBAC’, where admins can add numerous specifications to custom roles in order to extend secure access across an entire organization as widely as possible. Utilizing Policy-as-Code and integrations with tools like OPA or Checkov, you can be confident that the right people have the right amount of access and let teams function independently to push/pull their changes to code.
With that, teams do not have to wait for someone else’s okay to be productive. Organization members can achieve that by using ephemeral environments (with time-to-live settings) to test new features, automated scheduling, and configurable templates.
Additional features like dashboarding and audit logs, plus available integrations with several major observability platforms, give admins even more data to adjust those policies in the long-term.
Covering All Frameworks
env0 is framework-agnostic. In other words, env0 covers Terraform, Pulumi, CloudFormation, Terragrunt, and others. While some companies (HashiCorp, AWS) provide a premium service on top of their IaC frameworks, they often encourage vendor lock-in and cover their own frameworks at the expense of others.
Fair Pricing, FinOps Built-in
env0 encourages scale by using deployment-based pricing. However, other services such as Terraform Cloud price by RUM – or resources under management. RUM guarantees a higher bill for companies month to month, as teams are always adding more complex code and configuration changes.
Deployment pricing provides flexibility to team managers to customize their environments in such a way to be smart with their cloud spending. env0 encourages this further with its slew of FinOps features like cost management, budget notifications, and project-based calculations. Those analyses inform future policies to limit or increase budgets for users, teams, specific resources, or particular deployments.