Recently, I had a chance to present a lightning talk at OpenTofu Day during KubeCon 2024. In my session, I explored two distinct approaches to integrating OpenTofu into deployment pipelines: approve-before-merge using Atlantis and the traditional continuous deployment (CD) strategy where changes are applied post-merge.
In my talk I covered the workflows, benefits, and challenges of each method, examining how they handle collaboration, ensure compliance, and manage deployment risks. Attendees will gain insights into choosing the right strategy for their teams and balancing speed and safety in their IaC deployments.
Here’s the video recording of my session, I hope you enjoy it!
Transcript
Hello, everyone. So today we're going to talk about The Merge conflict-conflict. As it states, do we apply before merge or after merge? My name is Asaf Blubshtein, and I’m a Solution Architect at env0. I’ve been working with env0 for almost exactly a year now, starting at the last KubeCon in Chicago.
One of the things we notice when implementing Infrastructure as Code (IaC) solutions for customers with OpenTofu is that, as part of the pipeline, there’s usually some form of validation and checks in place. About 90% of the time, this includes steps like format, validate, and often running plans.
A frequent question I get is, “Okay, I ran the plan, and now I want to run apply. When do I run it? Should it be after merging the PR, or during the PR itself?” Let’s review both approaches and weigh their pros and cons.
Apply after merge is an approach familiar from application and software lifecycles. You have your main branch, create a feature branch, make changes, commit, run the plan, and then create a PR to merge into the main branch. After reviews, ideally with another set of eyes, the PR is merged, and you run Terraform or OpenTofu apply.
The advantages include fitting seamlessly into a GitOps workflow. It aligns with how application developers operate and ensures the main branch is always the source of truth. However, there’s a major challenge: the plan might succeed, but the apply could fail. Of course, this is something that we see a lot, and it really depends on the providers that you have because every provider implements their own set of how they run the plan and what checks are they doing.
That means a successful plan doesn’t necessarily guarantee a successful apply. Because of this, we can actually break our main branch and disrupt production. Adding to that, rollbacks can be very difficult. Let’s take an example: Jane creates a feature branch, makes updates, and creates a PR. John approves the PR, and then it’s merged after a successful plan and review. But the apply fails. And a lot of times, it doesn’t fail at the beginning—it fails halfway through. So we end up in this limbo state where half the resources are deployed, half are destroyed, some aren’t working, and production is impacted.
To decide whether to roll back, troubleshoot, or fix it, we can’t simply revert to a previous commit—that goes against the GitOps structure. We need to go through the entire process again: create a feature branch, make updates, and merge it again. If it’s the end of the day, John might have gone home, so we need to find another approver. During this entire time, our applications and infrastructure are impacted.
Now let’s look at apply before merge. The initial process is very similar: you create a feature branch, commit, and run plans. But this time, the review happens ideally before creating the PR. From the PR itself, we trigger the apply. Only after the apply is successful do we merge into the main branch.
The main benefits here are that the main branch is always green. If a rollback is needed, you can apply directly from the main branch without having to create any PRs or fixes. The main branch is always in a valid state. This approach also allows for faster iteration and troubleshooting.
If the apply fails, it can be fixed within the PR itself, avoiding unnecessary commits or PRs that might clutter the main branch. It’s easier to track the flow of infrastructure deployment this way.
The cons are that this approach doesn’t adhere to the traditional GitOps flow. For IaC developers with an application development background, this can feel counterintuitive—like, “What are you doing? This isn’t how it’s supposed to work.”
Additionally, if you don’t apply immediately, it can cause drift between the main branch and the PR. State and plan management also need to adapt to this new process to ensure one PR doesn’t overwrite or revert changes from another PR if it isn’t properly rebased with the main branch.
Let’s take another example. Jane and John both create feature branches at the same time. John’s change is simpler, so he creates his PR earlier. One way to avoid contention is to use a deployment queue.
A solution like Atlantis, which was one of the first to implement this in 2017, does it using locking. When John creates a PR, the state is locked until the PR is merged. Once the apply is successful and merged, the state is unlocked and available for others.
However, this creates a challenge: because Jane’s change is more complex, she has to wait until John finishes before she can continue. In the meantime, someone else could jump ahead, leaving Jane waiting idly.
So which option is better? Well, it depends. Apply after merge works well if it matches your current pipeline, especially if the same developers are deploying both applications and infrastructure. It also makes sense if managing large state files is a challenge with apply-before-merge workflows or if you trust the providers you’re using. On the other hand, apply before merge is ideal for avoiding broken main branches and simplifying troubleshooting, particularly if you often face failed applies that disrupt production.
In application development, concepts like merge queues or merge trains in GitLab allow grouped merges and tests to simplify rollbacks.
Could these be used for IaC? Not yet. Application and IaC development are fundamentally different. Application development typically happens in isolated containers, not directly in production, making rollbacks easier.
The holy grail would be combining merge queue concepts with IaC-specific rollback and plan comparison capabilities, ensuring no overlap or conflicts between PRs. This is something we hope to see in the future.
Thank you for listening! Please fill out the survey and let me know if you have any questions.