As you may have seen, we just launched our public beta of env0 last month. As part of the run-up to the launch, our dev team had to go through and make sure everything about our infrastructure was ready for ongoing public use: one element being creating a maintenance mode for both our Application and our public API.
Why do you need a maintenance mode for something like env0, which is built to be highly available? Well whether it’s human error (as in the case of AWS S3), a DDOS attack (like with github), or just a major upgrade (like Zapier), even the most highly available applications from the most experienced providers sometimes need to be able to be taken offline for a short period of time. And when you have to do that, you want to make sure that your users understand what is happening and still have a good experience.
Here’s how our team put together our maintenance mode using Terraform, AWS, and Github Pages (including all the code at the end!)
Architecture
We host all of our infrastructure on AWS, with a clear separation between Application and API:
- React front end application is hosted on S3 with CloudFront as a CDN.
- The Backend services are mostly Serverless using AWS Lambda with API Gateway that manages our public API.
- DNS is managed by Route 53.
Since we have a separation between the frontend application and our API, we need to have a maintenance mode for each of them, especially as we also have a public API that is used by our customers for integration in their CI/CD pipelines and other tools. Don’t forget this if you have multiple ways in which people access your services!
Requirements
As with any project, the first step is to lay out requirements and constraints. In this case, we came up with the following list of what the solution should do:
- Be implemented with Infrastructure as Code (IaC) as all of our stack is based on IaC
- Switch back and forth as fast as possible between normal and maintenance mode
- Switch automatically between normal and maintenance mode
- Be hosted by separate providers, where possible, from the current infrastructure in order to provide maximum redundancy
- Have an internal backdoor to allow our team to deploy a fix and test it before switching back
- Not override the current configuration with new deployments
- Communicate to an active user that we are in maintenance mode
Implementing a Solution
After investigating, we came to a few conclusions:
- We wanted to have a simple place to store the maintenance mode html, so we went with github pages.
- The switch itself will be done through the DNS, pointing to github pages when we are in maintenance mode.
- We will create a new DNS record and a Cloudfront distribution that will be our backdoor, and it will be always up and running.
- We will use our current push mechanism to notify existing users that are in the app that we are currently in maintenance mode.
Based on that, we implemented the following at each piece of the system — each system contains a description of what we’re doing, along with the link to a gist with the actual Terraform code.
Cloudfront
Looking at the Cloudfront distribution code, you can see that we are creating 2 distributions, one is for the actual application and the other one is for a backdoor:
Route53
Also in AWS, we want to ensure that Route53 is pointing in the right direction. In case we are not in maintenance mode it should point to the Cloudfront distribution, and when in maintenance mode it should point our CNAME to the github page. In either case, we should have the backdoor pointing to the Cloudfront distribution.
Additionally, we also set the TTL to be 60 seconds so it won’t take too long to move back and forth between maintenance mode and regular operation:
⚠️ Pay attention that this Terraform code does not create the Route53 hosted zone, nor the SSL certificates — you need to complete those as appropriate for your own setup ⚠️
Git Repo
Next, we need a git repo containing the html files for the new maintenance mode site. So we’ve created a simple Terraform code that will create the repo as well as add all existing files in the “maintenance_mode_website” folder to the repo, which in our case is the maintenance mode html file:
Github Pages
The last part is the trickiest to implement in Terraform, because Github pages configuration is not actually part of the github Terraform provider, which means that I can’t really configure it with the Terraform code. However because github offers an API to configure github pages, it can be done programmatically via the API. In our case, we can use the env0 custom flows feature to trigger those API calls once the deploy is finished:
Deployment
Now that our system is all configured, all I have to do is change the Terraform variable of the maintenance mode to be true/false and deploy the environment (in our case via the env0 UI).
Conclusion
The complete template source code can be found in this github repo which includes all the Terraform code, scripts, our env0.yml and the maintenance page html file. We hope you find this useful, or get other ideas for more ways to use Terraform for your deployment workflows. Our next blog post will give you a sneak preview on how we are creating a maintenance mode on our API using Terraform.
About env0
env0 lets your team manage their own environments in AWS, Azure and Google, governed by your policies and with complete visibility & cost management. You can learn more about env0 here and you can also try it out yourself. Feel free to drop us your thoughts below!