Join us as we sit down with Sergey Korolev of Rakuten Viber to learn how his team automates and manages Infrastructure as Code, and how mature IaC practices improve developer experience, reduce technical debt, and streamline operations at scale.
Discover practical strategies and best practices for:
- Automatically detecting drift from manual IaC changes
- Resolving deployment tracking issues
- Setting up managed developer self-service for IaC
- Integrating IaC with GitOps workflows
- Optimizing IaC-related costs
- Unifying IaC-related processes
and more!
-----
Transcript
Andrew
Welcome, everyone, and thank you for attending today's webinar, “Using env0 to Pave the Path to Infrastructure as Code Maturity.” Today I have Sergey Korolev with me from Viber. Sergey, would you like to introduce yourself?
Sergey
A few words about myself: I've been in the area of DevOps and managing teams that are working in this field of DevOps for the last ten years. Besides that I like traveling, I like snowboarding and doing martial arts in my free time. Thank you.
Andrew
Nice. My name is Andrew. I'm the Director of Sales Engineering here at env0. And today I'll be helping lead this conversation. We do have the chat open, so please feel free to ask your questions in there, and then we'll address that during the Q&A portion of this conversation.
Before we begin, a little quick overview of what we'll be talking about today. We'll be reviewing some challenges with Viber’s previous CI/CD process, how they managed Infrastructure as Code, the cultural shifts related to that, and what that really meant in terms of managing resources and deployments. And at what point did the previous process come to a breaking point? So what was that tipping point for change? And just before we begin this conversation, let's quickly review env0.
env0 gives you the ability to fast track your Infrastructure as Code’s maturity. What we're seeing in the market is a lot of times people are starting with no Infrastructures as Code, click ops and manual deployments, which is slow, and they start scripting and adding some Infrastructure as Code. But that creates cost overruns and security challenges. At env0, we're trying to optimize both productivity and governance through four main pillars: automation, cost controls, managed self-service, and governance.
We'll be addressing some of these features that Viber has been able to take advantage of and talk more about how we are helping Viber move along this maturity curve. So on a high level, what was their biggest game changer?
Sergey
When we started with env0, I guess the first thing that gave us the most value was drift detection. It's probably also the first topic we are going to be discussing today. Basically just understanding where we stand in terms of what our Terraform states look like. And I will be describing exactly where we started from and where we are today. Drift detection is the thing that actually helped us a lot, especially in the first few phases of onboarding.
Andrew
Awesome. So let's just jump right into it then. In terms of drift detection. Let's go into before and after. What did you see?
Sergey
So before, onboarding to env0 would be… I guess a lot of the participants in this chat know this feeling when you try to apply some Terraform. And then you are like, oh, it's been changed two years ago and for some reason I don't know who did that and well, I can’t find him now or why he made the change he did. Detecting drifts, this is the term that is being used, a long time after the actual apply or the actual change happened. So you basically just see it whenever you run a plan or apply and you can check that list.
How we use that, we would check it only at this phase, like you need to make a change. You make the change. Oh no, the state is not what I was expecting it to be. And now I need to fix the previous change that somebody else made.
And of course, there was no process to regularly check whether the states have been changed. If the resources, the actual resources, have been changed or not. And once we have onboarded env0, we detected that basically 60% of all our states, of all our Terraform so-called environments in env0, are like that.
And just to let you know, we've been using Terraform at least for the scanned environments for about two years. We have between 7 to 10 committers I think. So plenty of hands touch the code, and not just the code. It could be either the code or the actual resources in AWS in our case.
Now, after integrating the environments and code into env0, you can see in the screenshot that today we have more than 900 environments. Most of them are not drifted. Of course the drift happens due to some manual changes or other changes people or some automation that might be running and changing resources like AWS or ECS for example, there are things that are being changed by the service itself and you need to correlate the Infrastructure as Code to that.
We do these scans every other day. We are running a plan on all of our environments every other day and detecting whether those environments, whether the Terraform that represents them, are the actual resources that we have in our cloud environment, and they can see it easily in this dashboard. It's very convenient, something I do every day, and me, as a manager, I’m bugging my engineers like “you need to fix that, why does it drift?”
Eventually we reduced it to zero. And as time goes on, you can see that the drift is going back and forth, but we try to eliminate it as much as possible. Some drifts are eliminated easier than others. But yeah, and as we go on with the talk, I will also describe how this, in my opinion, is leading to the cultural shift in our team.
Andrew
Awesome. Let's move on to the next topic then. Environment Discovery. So let me just quickly inform the audience on what Environment Discovery is at env0. So it's basically the ability to be able to scan for code changes within your git repo and automatically on board that into env0. And another way to describe it is kind of self-service through code.
So as your developers are creating here, you can see in the image a new Terraform stack and a folder. env0 essentially detects that through the PR process and will generate the plan and then deploy that resource once it's been approved.
Sergey
I think it was about a year and a half ago, Environment Discovery was not as advanced as it is now. So that's a good fit for whoever is joining now will have it much easier for themselves. The cool thing about that is that any Terraform I’m adding, let's say I have this kind of a folder structure. When I have an integration environment, I have some infrastructure. Let's say I have EKS and I have some integration cluster now.
So basically the integration clusters and environment. And now I want to, for example, add another cluster. All I want to add is the new service that I was not managing before, for example ECS, and all I need to do after I configure the environment discovery in env0 – I just add the folder, right? And the same GitOps that again will basically trigger env0 to add an environment according to the new structure of the folder.
And this is something that we benefit a lot from at this point, in my view, and we benefit the most from it because it's just straightforward. You add your Terraform and it’s being run somewhere else, not on your computer, for example. Something I will also probably be talking about in the next few slides.
I guess that's as far as it goes in terms of the Environment Discovery itself. Another thing I would add is that once you discover the environment, all the newly discovered environments go through the same process. If you need linting, governance checks, or tests to apply to your Terraform, it will be consistently applied to all Terraform environments.
Previously, it wasn't like that. People would run Terraform on their machines, and each team member might or might not follow best practices. For example, everyone should run lint, but I didn't have the tool to enforce that. I couldn't ensure that everyone was running lint regardless of where they were running Terraform from.
I think we can move on to the next topic.
Andrew
Okay. Let's talk about GitOps then.
Sergey
GitOps is something that is very close to my heart. We've been using ArgoCD for the past three years or so and we find it very convenient just making changes in Git and having this change go on as you proceed with whatever strategy that you choose. But for example, if you open a merge request, let's say in Terraform, you would have a plan. If you merge this merge request or pull request, you will actually run an apply. This is basically the general idea here and we try to implement it ourselves.
So we implemented the kind of testing mechanism where we would run a plan and then apply to our environments prior to onboarding env0. And it was a hassle to implement the solution itself, it was a hassle to maintain it and actually make sure that people are not just skipping the pipeline itself and just submitting the changes.
And because also, as I mentioned, we did not have the actual apply and plan of the actual infrastructure as part of the pipeline. And we've reviewed Atlantis. I tried to see why it fell short exactly but didn't find the exact things that made it not work for us. I think it was something involving us using GitLab Enterprise and Atlantis not supporting it at the time.
And also it's basically a CI tool which you need to orchestrate more things around to make it actually work. You will need to implement the whole thing in your CI/CD pipelines to make it work, as you would expect. At the time, about a year and a half ago, two years ago when we tested it, maybe it's better now.
Another thing is that, as I mentioned, everybody could just run Terraform as they wished. And me as a manager, I would not see these changes. And that from personal experience, one of the things that could happen is that you will run the Terraform locally and you go on to another task but not submit the commit.
And so then you get the drift in your Terraform because it did not submit the code to Git. And you now need to understand who made the change and why he made it. And eventually you discover that somebody just forgot to submit the commit.
Andrew
There's a question from the audience about ArgoCD and whether or not you… Is it replacing Argo CD?
Are you still running Argos CD on the side? Does env0 interact with Argos CD In which way?
Sergey
For us it was pretty straightforward that in our use case we will probably not use env0 for deploying Kubernetes just because we already had that and we had this expertise and were totally fine with ArgoCD, but it's complementary.
One of the cool things that you could do is, for example, and this is what we are doing, deploying the whole ArgoCD infrastructure itself in Kubernetes via env0. So you need to deploy that. So to say, operating ArgoCD, you do it for env0 for Terraform. Similarly, you can deploy basically any Helm chart either via env0's capabilities to deploy Helm charts or for Terraform providers to deploy Helm charts, both ways work.
But we chose to stay with ArgoCD for deploying Kubernetes. We really like this ecosystem and they remain as a complementary system to one another.
Andrew
Right. Thank you. Let's talk about the next topic: Cost Management. How does env0 help you with managing costs?
Sergey
So if you look at the right screen here, you will probably see that one of our environments is pretty costly, right? I reviewed it because I saw it in env0 discovery and found this is basically a cloud fund distribution that is getting a lot of requests, a lot of traffic. And this is why it costs as much as it does.
And basically in cost management, initially when starting, I didn't think of it as a thing that I might be so interested in because we had so many other FinOps tools that we are grabbing information from, we have anomaly detection, you have the cost Explorer, you might have some third party that does that for you and you have all these kinds of things that your management costs.
And it depends on your company, you might have dedicated FinOps engineers, you might have developers doing the FinOps and you might have your software engineers doing FinOps. So it depends on the company’s strategy regarding FinOps in general. And I think from the perspective of a DevOps engineer, you're not going to cost-explore every commit or every change. You're doing infrastructure and you will probably review your environments and your Terraform code more often than reviewing your costs. Especially if a FinOps is a different group or a different team or a different person in your organization. So putting it in your face basically is something that helps us extend our FinOps practices, extend our FinOps options and visibility. It does not necessarily replace those.
But there is another thing regarding that, we can see exactly which resources are costing what amount of money. I said that an environment represents a Terraform folder, for example, so you know exactly which resources are in this Terraform, following this Terraform state. And you can see exactly how much they cost, which is really cool.
And also it is relatively simple to configure. Basically I think Terratag is env0’s open source, so it's just incorporated in env0 itself. And basically what it does, on top of your state file it adds the needed tags for your environment ID and project ID. And then you configure your cost explorer and your billing so that those tags are tagged and you can see by these tags of your env0 environment, for example, how much this specific resource costs you according to the tag. And all that was just an addition that you basically just enable. It's relatively easy to configure. I think it took me an hour or so to do the whole configuration.
So it's a feature that I like a lot. I would not say again that it changed the whole FinOps practices in our company, but it for sure led me to some interesting environments where I immediately saw that something was wrong. This is a tool that I use each and every day.
Andrew
So with all these different topics I’d love to better understand, how did it affect your company culture?
Sergey
I wouldn't say company because we – and in the next few slides you'll see that we are going to talk about – our future plans and how we see env0 part of it. Today we're mostly using env0 as a tool for our DevOps engineers and culturally I think that things such as auditing and governing our code became more of an everyday practice, something that we can introduce to the Infrastructure as Code more easily.
And for Viber, for example, it's very important in terms of auditing and governance. Let's move to a different topic: Git and commit, commits and changes in Git. So we have the governance of it, we need audits and we have for example to 4 eye principle practice. Well, we have somebody mandatorily check our code and approve it.
The same goes for Infrastructure as Code. With this in mind we also implement the governance and auditing of everything via env0 as well. And another thing is that in the culture of our team, something that changed is the state of mind of not breaking states and keeping the broken states as low as possible and reducing the tech debt.
And I think when you're looking at the long run, you want to keep your tech debt as low as possible for the team and the future of people to come. You don't want to come to a place where you'll need to find those things breaking all the time or broken things that are there for years. And I guess many of the participants today know exactly what I'm talking about. And of course, it did not eliminate everything. But I think the right life mindset will eventually get us to this point.
Andrew
So yeah, cultural change definitely is hard and takes the right people in place and the processes and in order to help move that along and that tools can only do so much. We'd love to talk next about your expectations, your experience from onboarding with env0.
So there are naturally expectations. And I think this will be really interesting for people who are just going to start looking at env0 or considering. So let's look at these five different parts here. When you're talking about transferring pipelines, what was your expectation and what was the reality?
Sergey
I would expect onboarding all of our environments, especially talking about a Terraform-like liability that consists of all those kinds of Terraform and different structures and different ways that were applied.
I expected it to be less effort, it took me about, I think, two weeks to do it end to end. Today we are using the auto Discovery feature to just continuously onboard new environments automatically. So now it doesn't take any more effort anymore.
Andrew
What about visibility? You mentioned here that you were concerned about how you actually see the resources in env0 and policies and things like that?
Sergey
Yeah, so I didn't have the chance to introduce it here, but in general, what you can see besides the dashboard is all the environments and those statuses and whether they are drifted or not. So if you click on any of the environments, for example, you will see an audit of all the deployments that have been done.
And you can see who did it, why they did it, when all the phases of the lifecycle of a specific deployment, let's say it's starting by initializing the Terraform and then planning and applying it and some steps in between, which can be the tagging that I told you about with Terratag. It can be linting as a step. You can add custom scripts to it, OPA, a plugin that you can add. So again, from a manager’s perspective it's so convenient for me to understand what my team is working on. As an engineer, just going back and just reviewing: Okay, which changes were done to some resources that I own? That’s also something that we value a lot.
Andrew
Definitely. A lot of our customers and prospective clients are concerned about auditability and being able to track exactly what happened, especially when you're coming from a manual deployment process or even just simple pipelines, because in general, CI pipelines are harder to go through as opposed to what you have here in env0, you can click on the infrastructure and see exactly what those changes were.
I love this. There's this third expectation we would love if everyone loved env0. What was your actual experience onboarding with env0 and people's reaction to using it?
Sergey
So as I said, you can run all those kinds of things, such as limiting and OPA and other scripts. You have more steps that a plan would take locally. You can initialize it once, you have all the files that you need locally downloaded from your backend and basically rerun it as much as you want so you don't need to initialize it again. The plan will probably run faster on the apply and everything will be running relatively faster on a local machine due to the fact that you have all the files locally all the time.
People were hanging onto that. They would tell me, “it was fast for me, I don't see the reason why I need to run all this additional stuff. What does it give me?” And the answer is consistency. If you have policies in the company that you regulatory policies that you must follow, you must follow those policies.
It's not a matter of question. It's a matter of you have to do it. And people who are running things locally without applying those policies on their runs, eventually, hopefully not, but it can lead to issues. And I don't say that planning it in a managed environment and going through all the policies and all the tests and whatever will result in zero errors and no issues and no mistakes from people.
But at least you will have an audit, at least you will understand what happened and you hopefully will find out about the issue faster. And that's what I tend to believe. But yeah, people kept holding to this philosophy of running things locally. Now, again, it's not false at this point as well, but I see that people are very careful about when to do it and why.
Andrew
Let's get to this next topic about drifts.
This is an interesting one because it kind of turns into a double edged sword. Can you tell us more about your experience?
Sergey
We started off with about 60% drift to the environments. There is a specific engineer in my team which is very dedicated to eliminating drifted environments, he's been doing it for quite a while now, like for the past year or so. He's especially into those specific cases which are really hard to tend to, some changes that were made years ago. First thing first, handling drifted environments is a pain in ass. And as long as you keep this drift, you will have a harder time in the future. And this is exactly why I tend to encourage everybody to treat it as fast as possible.
But more than that, the things that we started also doing is thinking about how to modularized our environments and our deployments better. It depends on the use case and the scenario, but there are use cases where you say, I want to just try the Terraform as plainly as possible and you do so, but after some time you realize that this request is coming back to you again and again and again. So now we need to understand what the request is and how you can modularized it better. This is actually not a simple task. You need to understand exactly what your customers, or in our case, the engineers, need. And what is the recurring request that requires a specific solution.
We have some use cases. We use the env0 templates, which is basically like collateral for modules, and we use that to deploy some resources in the cloud which are reusable. A simple example – we have a team that uses Lambda functions a lot. So we basically have this module which is representing their needs for the lambda function and its environment, and we redeploy it as a module using the env0 templates and we find it to be very time saving eventually. But initially we would have been like, how are they able to do it?
Andrew
Because it wasn't modularized yet. There's a question from the audience about where the drift is coming from. So earlier you mentioned there are 60% of the drift. Was it because people were doing local applies or what else was causing that drift?
Sergey
Yeah, that's a simple scenario from the perspective of a DevOps engineer. Our DevOps engineers have full admin access to our environments, managing the entire cloud environment end-to-end. So, let's say we have an issue with a team in production and need to make changes.
The simplest scenario would be changing the number of instances. If there's a production issue, you might change these things manually in the AWS console, and you'll likely need to document it. The drift will indicate that you had ten instances, changed it to twenty, and now you need to revert it.
This is the drift. While this is a specific small change, it can be related to any service and any change. Urgent changes often cause drifts, as they require immediate action.
Andrew
This is also the last expectation here we get asked a lot, especially since CICD needs so many tooling and what scope does env0 have. So what can tell us more about this expectation? Does env0 replace everything?
Sergey
No, not really. I describe it in the part we talked about GitOps. So you can also see the pipeline itself. So basically env0 is integrated into all Git pipelines, you can see the pipeline itself showing you the whole like plan and apply, and all the steps that are happening in env0 itself.
So technically you can avoid going into the env0 system and just using your pipeline as-is for Git, for example. Also you can use the API and all of the Terraform modules of env0 API to integrate it with different other tools that you use in your company, for example IVP, and this is something that we plan for the future.
Andrew
So that's a good transition to talk about future plans. So you've been using env0 for a while now. Tell us about your journey and your future journey?
Sergey
A few things are important to me, and one is the governance aspect. I've talked about policy enforcement quite a lot today. Tools like OPA are something we are looking to incorporate. We're also expanding our developer portal for self-service via env0 for cloud resources. This answers a question from the audience about integrating env0 for developers. In our use case, we do not expose env0 directly to the developers; it operates in the background.
We are waiting for a few features that we hope will come soon. One is Infrastructure as Code coverage control and drift blame. So I can blame the engineers for being… No, I'm kidding. Drift blame is a term used to describe identifying who made changes in the console. For example, if someone changed the number of minimum instances from 10 to 20, I would want to know who did that.
So this is something that we all are eager to get as a feature in the future. And the last point here is talking about a maturity journey. We are not there yet. We’re using the GitOps approach and we have env0 managing and governing our environment. But we are still not at the self-service level. We still have work to do in componentization. You need to understand what you want to component inside and which features you want to introduce as a self-service. So we are working on those.
Andrew
Thank you for sharing your experience. Let's open up to open Q&A. There's a question about your separation of Argos CD. How do you separate resources that need to be automatically deployed by env0 and others that need to be deployed by ArgoCD?
Sergey
At this point we don't have a solution just yet. The way we are looking at it is that we will probably use the IDP and together with ArgoCD detection of new applications, right? So basically env0 will detect Terraform and ArgoCD will detect the application. Basically the same approach on both ends. So you will need to add the Terraform code and the Helm chart or manifest it.
And each of the systems will separately deploy the resources that are needed. So for us, the approach is to use IDP and a developer portal. A developer can trigger the process and there are two different stages and each stage whether the Helm chart or the cloud resources.
Andrew
I'll put in a chat a blog post where it shows env0 deploying. So you mentioned this, you deploy your Kubernetes cluster, but you need to bootstrap ArgosCD onto your company's cluster so you are using a Helm chart or Terraform deploying Helm chart, you can deploy ArgoCD’s operator into your Kubernetes cluster. Now, in order to configure ArgosCD to essentially listen to new applications, you can deploy the application YAML, which is the Argos CD’S CRD for managing applications.
You can use env0 to manage this application YAML as well. If you want to deploy Kubernetes manifests directly into your ArgoCD cluster, env0 will continue to listen for any Infrastructure as Code changes, such as updates to Terraform. All of this is managed in GitOps with ArgoCD. Because you've configured ArgoCD to listen, it will automatically respond to any image changes or manifest changes, managing its own processes accordingly.
So essentially GitOps on both sides, listening to their own hooks within Git.
Sergey
Yeah there's just another one about this topic is that you have application sets you know in ArgoCD you also have an approach which is called app of apps, one application that holds many applications. So for us it would be at least at this point, we are at the point where we have the app of apps deployed via Terraform and it is managing the whole other applications that also might be useful there.
Andrew
There is a question here. Have you used env0’s Environment Workflow? You answer the question, I’ll describe Workflow. So Workflow for our audience in env0 is essentially an orchestrator of multiple templates, it allows you to deploy a multilayer infrastructure. So say you have a three tier infrastructure network, compute and services.
You can keep each of these different services as separate Terraform templates or Terraform resources and env0 will orchestrate, make sure it deploys the right in the right order. So that's the high level of what env0 workflow is. The question is, are you using it? And if so, what has your experience been?
Sergey
Yeah, so I would say no. We've evaluated some use cases as we have before, but eventually we end up not using it at this point.
Andrew
Okay, let's look at another question here. I'm interested in order of magnitude, how many workspaces, how many of these are templated and auto discovered and how long do they actually live? So how many env0 environments or Terraform Workspaces, are you managing with env0? Roughly, yeah.
Sergey
So as I presented it before in the dashboard and then one of the slides, it’s about 900 of those. Most of those are environments that are pointing and discovered in some GitHub repository that we have that holds most of all the Terraform code and basically each folder eventually is an environment. Most of those are not templated.
But we have some use cases which are the templates that as I describe I gave one example, the Lambda functions. We have a lot of static websites that we deployed that use templates. And I think the other one that we consider using is with the self-service that we want to introduce. Eventually those will also be templated environments.
Andrew
Here's a question. We are planning to force all the development to to be deployed by a central code catalog that's maintained by a single infra team. Is this something env0 you're seeing more of, or are teams still able to develop Infrastructure as Code themselves? I guess I will answer that. So as you heard from Sergey's perspective right now, they are developing Infrastructure as Code themselves and slowly introducing templates to other team members.
At env0, working with dozens and dozens of customers, we see a broad spectrum. Some are completely siloed and command and control, I would say. So basically the central team is managing all of the resources and deploying all the resources, and then some are having a catalog approach. They create essentially a set of resources that their teams can onboard onto and then let the dev teams choose.
And then some are mixed in the sense that they have subject matter experts within each of the dev teams. So those people would like to write Infrastructure as Code, but then you also have the people on the other end of the spectrum who don't know any Infrastructure as Code and need a catalog in order to get access to resources.
So in env0 with templates, you can essentially create these quote unquote modules that will be able to give access to resources to these dev developers who aren't familiar with the infrastructure. At the same time, those modules you can publish in MS model registry for the developers who are familiar with infrastructure and want to compose their own resources, you can use what we described earlier with environment discovery and give them the ability to get access and deploy their own Infrastructure as Code through a PR process as well.
So we're definitely seeing the spectrum and it really kind of depends on the kind of people you have within your organization and the type of processes that you want to put in place. And env0 can really help cover both sides of that. And I hope that helps give you a sense of capabilities within env0. Maybe we'll take one more question from the audience.
For the questions we didn't get to cover, which are a bit more about env0 in general, maybe we'll segue into closing statements. So thank you again, Sergey, for taking this time to meet with us. Any last words about your experience with env0?
Sergey
I guess that a motto in life for me is taking things slow. And this is a process. I think we are in it for the last one and a half years and it will take more time.
Andrew
Absolutely, let's keep on making steps towards climbing that mountain. So thank you for taking your time. So I want to end on this one note, as since you're now here and you're probably interested in seeing how you can get started at env0, we have this concept of a one-day proof of value, so essentially we can help unlock and help you get an understanding of how env0 can build value for you.
It's an easy four-step process. We create an organization, add credentials, connect your Git repo, and start deploying resources in env0. And what we hope to achieve and can achieve within this one day POC is simply be able to show you the setup process, show you the cost estimation and start tracking cost and setup drift detection.
We've done this with a few of our customers already and immediately added value and we’ll show you what we can do. We'll start with a one-hour pre-onboarding session to get you started. Then, on that day, we'll go through the entire checklist and get you fully onboarded. We'll set up Slack or Teams support for easy communication and give you a 30-day trial of env0 so you can truly start experiencing the ROI.
After the 30 day trial, we expect you to reduce up to 45% of your cloud cost, increase your deployment speeds and decrease your time to merge. And again, all we need to do is create an organization, set up some cloud credentials, connect to whatever version control system you're using and deploy some Terraform or other Infrastructure as Code that you're using. If you're not ready for a one day POC just yet and you want to learn more, you can book a demo through our website, env0.com/demo-request or sign up for free trial on env0.com and check out env0.com for more docs and blogs and resources.
I want to thank everyone attending again today. If you have any more questions, please feel free to reach out and we'll be happy to address those on a future call or through chat. All right, So that concludes our webinar for today. Thank you again, Sergey, for joining us and hope to hear from everyone soon.
Cheers!
Join us as we sit down with Sergey Korolev of Rakuten Viber to learn how his team automates and manages Infrastructure as Code, and how mature IaC practices improve developer experience, reduce technical debt, and streamline operations at scale.
Discover practical strategies and best practices for:
- Automatically detecting drift from manual IaC changes
- Resolving deployment tracking issues
- Setting up managed developer self-service for IaC
- Integrating IaC with GitOps workflows
- Optimizing IaC-related costs
- Unifying IaC-related processes
and more!
-----
Transcript
Andrew
Welcome, everyone, and thank you for attending today's webinar, “Using env0 to Pave the Path to Infrastructure as Code Maturity.” Today I have Sergey Korolev with me from Viber. Sergey, would you like to introduce yourself?
Sergey
A few words about myself: I've been in the area of DevOps and managing teams that are working in this field of DevOps for the last ten years. Besides that I like traveling, I like snowboarding and doing martial arts in my free time. Thank you.
Andrew
Nice. My name is Andrew. I'm the Director of Sales Engineering here at env0. And today I'll be helping lead this conversation. We do have the chat open, so please feel free to ask your questions in there, and then we'll address that during the Q&A portion of this conversation.
Before we begin, a little quick overview of what we'll be talking about today. We'll be reviewing some challenges with Viber’s previous CI/CD process, how they managed Infrastructure as Code, the cultural shifts related to that, and what that really meant in terms of managing resources and deployments. And at what point did the previous process come to a breaking point? So what was that tipping point for change? And just before we begin this conversation, let's quickly review env0.
env0 gives you the ability to fast track your Infrastructure as Code’s maturity. What we're seeing in the market is a lot of times people are starting with no Infrastructures as Code, click ops and manual deployments, which is slow, and they start scripting and adding some Infrastructure as Code. But that creates cost overruns and security challenges. At env0, we're trying to optimize both productivity and governance through four main pillars: automation, cost controls, managed self-service, and governance.
We'll be addressing some of these features that Viber has been able to take advantage of and talk more about how we are helping Viber move along this maturity curve. So on a high level, what was their biggest game changer?
Sergey
When we started with env0, I guess the first thing that gave us the most value was drift detection. It's probably also the first topic we are going to be discussing today. Basically just understanding where we stand in terms of what our Terraform states look like. And I will be describing exactly where we started from and where we are today. Drift detection is the thing that actually helped us a lot, especially in the first few phases of onboarding.
Andrew
Awesome. So let's just jump right into it then. In terms of drift detection. Let's go into before and after. What did you see?
Sergey
So before, onboarding to env0 would be… I guess a lot of the participants in this chat know this feeling when you try to apply some Terraform. And then you are like, oh, it's been changed two years ago and for some reason I don't know who did that and well, I can’t find him now or why he made the change he did. Detecting drifts, this is the term that is being used, a long time after the actual apply or the actual change happened. So you basically just see it whenever you run a plan or apply and you can check that list.
How we use that, we would check it only at this phase, like you need to make a change. You make the change. Oh no, the state is not what I was expecting it to be. And now I need to fix the previous change that somebody else made.
And of course, there was no process to regularly check whether the states have been changed. If the resources, the actual resources, have been changed or not. And once we have onboarded env0, we detected that basically 60% of all our states, of all our Terraform so-called environments in env0, are like that.
And just to let you know, we've been using Terraform at least for the scanned environments for about two years. We have between 7 to 10 committers I think. So plenty of hands touch the code, and not just the code. It could be either the code or the actual resources in AWS in our case.
Now, after integrating the environments and code into env0, you can see in the screenshot that today we have more than 900 environments. Most of them are not drifted. Of course the drift happens due to some manual changes or other changes people or some automation that might be running and changing resources like AWS or ECS for example, there are things that are being changed by the service itself and you need to correlate the Infrastructure as Code to that.
We do these scans every other day. We are running a plan on all of our environments every other day and detecting whether those environments, whether the Terraform that represents them, are the actual resources that we have in our cloud environment, and they can see it easily in this dashboard. It's very convenient, something I do every day, and me, as a manager, I’m bugging my engineers like “you need to fix that, why does it drift?”
Eventually we reduced it to zero. And as time goes on, you can see that the drift is going back and forth, but we try to eliminate it as much as possible. Some drifts are eliminated easier than others. But yeah, and as we go on with the talk, I will also describe how this, in my opinion, is leading to the cultural shift in our team.
Andrew
Awesome. Let's move on to the next topic then. Environment Discovery. So let me just quickly inform the audience on what Environment Discovery is at env0. So it's basically the ability to be able to scan for code changes within your git repo and automatically on board that into env0. And another way to describe it is kind of self-service through code.
So as your developers are creating here, you can see in the image a new Terraform stack and a folder. env0 essentially detects that through the PR process and will generate the plan and then deploy that resource once it's been approved.
Sergey
I think it was about a year and a half ago, Environment Discovery was not as advanced as it is now. So that's a good fit for whoever is joining now will have it much easier for themselves. The cool thing about that is that any Terraform I’m adding, let's say I have this kind of a folder structure. When I have an integration environment, I have some infrastructure. Let's say I have EKS and I have some integration cluster now.
So basically the integration clusters and environment. And now I want to, for example, add another cluster. All I want to add is the new service that I was not managing before, for example ECS, and all I need to do after I configure the environment discovery in env0 – I just add the folder, right? And the same GitOps that again will basically trigger env0 to add an environment according to the new structure of the folder.
And this is something that we benefit a lot from at this point, in my view, and we benefit the most from it because it's just straightforward. You add your Terraform and it’s being run somewhere else, not on your computer, for example. Something I will also probably be talking about in the next few slides.
I guess that's as far as it goes in terms of the Environment Discovery itself. Another thing I would add is that once you discover the environment, all the newly discovered environments go through the same process. If you need linting, governance checks, or tests to apply to your Terraform, it will be consistently applied to all Terraform environments.
Previously, it wasn't like that. People would run Terraform on their machines, and each team member might or might not follow best practices. For example, everyone should run lint, but I didn't have the tool to enforce that. I couldn't ensure that everyone was running lint regardless of where they were running Terraform from.
I think we can move on to the next topic.
Andrew
Okay. Let's talk about GitOps then.
Sergey
GitOps is something that is very close to my heart. We've been using ArgoCD for the past three years or so and we find it very convenient just making changes in Git and having this change go on as you proceed with whatever strategy that you choose. But for example, if you open a merge request, let's say in Terraform, you would have a plan. If you merge this merge request or pull request, you will actually run an apply. This is basically the general idea here and we try to implement it ourselves.
So we implemented the kind of testing mechanism where we would run a plan and then apply to our environments prior to onboarding env0. And it was a hassle to implement the solution itself, it was a hassle to maintain it and actually make sure that people are not just skipping the pipeline itself and just submitting the changes.
And because also, as I mentioned, we did not have the actual apply and plan of the actual infrastructure as part of the pipeline. And we've reviewed Atlantis. I tried to see why it fell short exactly but didn't find the exact things that made it not work for us. I think it was something involving us using GitLab Enterprise and Atlantis not supporting it at the time.
And also it's basically a CI tool which you need to orchestrate more things around to make it actually work. You will need to implement the whole thing in your CI/CD pipelines to make it work, as you would expect. At the time, about a year and a half ago, two years ago when we tested it, maybe it's better now.
Another thing is that, as I mentioned, everybody could just run Terraform as they wished. And me as a manager, I would not see these changes. And that from personal experience, one of the things that could happen is that you will run the Terraform locally and you go on to another task but not submit the commit.
And so then you get the drift in your Terraform because it did not submit the code to Git. And you now need to understand who made the change and why he made it. And eventually you discover that somebody just forgot to submit the commit.
Andrew
There's a question from the audience about ArgoCD and whether or not you… Is it replacing Argo CD?
Are you still running Argos CD on the side? Does env0 interact with Argos CD In which way?
Sergey
For us it was pretty straightforward that in our use case we will probably not use env0 for deploying Kubernetes just because we already had that and we had this expertise and were totally fine with ArgoCD, but it's complementary.
One of the cool things that you could do is, for example, and this is what we are doing, deploying the whole ArgoCD infrastructure itself in Kubernetes via env0. So you need to deploy that. So to say, operating ArgoCD, you do it for env0 for Terraform. Similarly, you can deploy basically any Helm chart either via env0's capabilities to deploy Helm charts or for Terraform providers to deploy Helm charts, both ways work.
But we chose to stay with ArgoCD for deploying Kubernetes. We really like this ecosystem and they remain as a complementary system to one another.
Andrew
Right. Thank you. Let's talk about the next topic: Cost Management. How does env0 help you with managing costs?
Sergey
So if you look at the right screen here, you will probably see that one of our environments is pretty costly, right? I reviewed it because I saw it in env0 discovery and found this is basically a cloud fund distribution that is getting a lot of requests, a lot of traffic. And this is why it costs as much as it does.
And basically in cost management, initially when starting, I didn't think of it as a thing that I might be so interested in because we had so many other FinOps tools that we are grabbing information from, we have anomaly detection, you have the cost Explorer, you might have some third party that does that for you and you have all these kinds of things that your management costs.
And it depends on your company, you might have dedicated FinOps engineers, you might have developers doing the FinOps and you might have your software engineers doing FinOps. So it depends on the company’s strategy regarding FinOps in general. And I think from the perspective of a DevOps engineer, you're not going to cost-explore every commit or every change. You're doing infrastructure and you will probably review your environments and your Terraform code more often than reviewing your costs. Especially if a FinOps is a different group or a different team or a different person in your organization. So putting it in your face basically is something that helps us extend our FinOps practices, extend our FinOps options and visibility. It does not necessarily replace those.
But there is another thing regarding that, we can see exactly which resources are costing what amount of money. I said that an environment represents a Terraform folder, for example, so you know exactly which resources are in this Terraform, following this Terraform state. And you can see exactly how much they cost, which is really cool.
And also it is relatively simple to configure. Basically I think Terratag is env0’s open source, so it's just incorporated in env0 itself. And basically what it does, on top of your state file it adds the needed tags for your environment ID and project ID. And then you configure your cost explorer and your billing so that those tags are tagged and you can see by these tags of your env0 environment, for example, how much this specific resource costs you according to the tag. And all that was just an addition that you basically just enable. It's relatively easy to configure. I think it took me an hour or so to do the whole configuration.
So it's a feature that I like a lot. I would not say again that it changed the whole FinOps practices in our company, but it for sure led me to some interesting environments where I immediately saw that something was wrong. This is a tool that I use each and every day.
Andrew
So with all these different topics I’d love to better understand, how did it affect your company culture?
Sergey
I wouldn't say company because we – and in the next few slides you'll see that we are going to talk about – our future plans and how we see env0 part of it. Today we're mostly using env0 as a tool for our DevOps engineers and culturally I think that things such as auditing and governing our code became more of an everyday practice, something that we can introduce to the Infrastructure as Code more easily.
And for Viber, for example, it's very important in terms of auditing and governance. Let's move to a different topic: Git and commit, commits and changes in Git. So we have the governance of it, we need audits and we have for example to 4 eye principle practice. Well, we have somebody mandatorily check our code and approve it.
The same goes for Infrastructure as Code. With this in mind we also implement the governance and auditing of everything via env0 as well. And another thing is that in the culture of our team, something that changed is the state of mind of not breaking states and keeping the broken states as low as possible and reducing the tech debt.
And I think when you're looking at the long run, you want to keep your tech debt as low as possible for the team and the future of people to come. You don't want to come to a place where you'll need to find those things breaking all the time or broken things that are there for years. And I guess many of the participants today know exactly what I'm talking about. And of course, it did not eliminate everything. But I think the right life mindset will eventually get us to this point.
Andrew
So yeah, cultural change definitely is hard and takes the right people in place and the processes and in order to help move that along and that tools can only do so much. We'd love to talk next about your expectations, your experience from onboarding with env0.
So there are naturally expectations. And I think this will be really interesting for people who are just going to start looking at env0 or considering. So let's look at these five different parts here. When you're talking about transferring pipelines, what was your expectation and what was the reality?
Sergey
I would expect onboarding all of our environments, especially talking about a Terraform-like liability that consists of all those kinds of Terraform and different structures and different ways that were applied.
I expected it to be less effort, it took me about, I think, two weeks to do it end to end. Today we are using the auto Discovery feature to just continuously onboard new environments automatically. So now it doesn't take any more effort anymore.
Andrew
What about visibility? You mentioned here that you were concerned about how you actually see the resources in env0 and policies and things like that?
Sergey
Yeah, so I didn't have the chance to introduce it here, but in general, what you can see besides the dashboard is all the environments and those statuses and whether they are drifted or not. So if you click on any of the environments, for example, you will see an audit of all the deployments that have been done.
And you can see who did it, why they did it, when all the phases of the lifecycle of a specific deployment, let's say it's starting by initializing the Terraform and then planning and applying it and some steps in between, which can be the tagging that I told you about with Terratag. It can be linting as a step. You can add custom scripts to it, OPA, a plugin that you can add. So again, from a manager’s perspective it's so convenient for me to understand what my team is working on. As an engineer, just going back and just reviewing: Okay, which changes were done to some resources that I own? That’s also something that we value a lot.
Andrew
Definitely. A lot of our customers and prospective clients are concerned about auditability and being able to track exactly what happened, especially when you're coming from a manual deployment process or even just simple pipelines, because in general, CI pipelines are harder to go through as opposed to what you have here in env0, you can click on the infrastructure and see exactly what those changes were.
I love this. There's this third expectation we would love if everyone loved env0. What was your actual experience onboarding with env0 and people's reaction to using it?
Sergey
So as I said, you can run all those kinds of things, such as limiting and OPA and other scripts. You have more steps that a plan would take locally. You can initialize it once, you have all the files that you need locally downloaded from your backend and basically rerun it as much as you want so you don't need to initialize it again. The plan will probably run faster on the apply and everything will be running relatively faster on a local machine due to the fact that you have all the files locally all the time.
People were hanging onto that. They would tell me, “it was fast for me, I don't see the reason why I need to run all this additional stuff. What does it give me?” And the answer is consistency. If you have policies in the company that you regulatory policies that you must follow, you must follow those policies.
It's not a matter of question. It's a matter of you have to do it. And people who are running things locally without applying those policies on their runs, eventually, hopefully not, but it can lead to issues. And I don't say that planning it in a managed environment and going through all the policies and all the tests and whatever will result in zero errors and no issues and no mistakes from people.
But at least you will have an audit, at least you will understand what happened and you hopefully will find out about the issue faster. And that's what I tend to believe. But yeah, people kept holding to this philosophy of running things locally. Now, again, it's not false at this point as well, but I see that people are very careful about when to do it and why.
Andrew
Let's get to this next topic about drifts.
This is an interesting one because it kind of turns into a double edged sword. Can you tell us more about your experience?
Sergey
We started off with about 60% drift to the environments. There is a specific engineer in my team which is very dedicated to eliminating drifted environments, he's been doing it for quite a while now, like for the past year or so. He's especially into those specific cases which are really hard to tend to, some changes that were made years ago. First thing first, handling drifted environments is a pain in ass. And as long as you keep this drift, you will have a harder time in the future. And this is exactly why I tend to encourage everybody to treat it as fast as possible.
But more than that, the things that we started also doing is thinking about how to modularized our environments and our deployments better. It depends on the use case and the scenario, but there are use cases where you say, I want to just try the Terraform as plainly as possible and you do so, but after some time you realize that this request is coming back to you again and again and again. So now we need to understand what the request is and how you can modularized it better. This is actually not a simple task. You need to understand exactly what your customers, or in our case, the engineers, need. And what is the recurring request that requires a specific solution.
We have some use cases. We use the env0 templates, which is basically like collateral for modules, and we use that to deploy some resources in the cloud which are reusable. A simple example – we have a team that uses Lambda functions a lot. So we basically have this module which is representing their needs for the lambda function and its environment, and we redeploy it as a module using the env0 templates and we find it to be very time saving eventually. But initially we would have been like, how are they able to do it?
Andrew
Because it wasn't modularized yet. There's a question from the audience about where the drift is coming from. So earlier you mentioned there are 60% of the drift. Was it because people were doing local applies or what else was causing that drift?
Sergey
Yeah, that's a simple scenario from the perspective of a DevOps engineer. Our DevOps engineers have full admin access to our environments, managing the entire cloud environment end-to-end. So, let's say we have an issue with a team in production and need to make changes.
The simplest scenario would be changing the number of instances. If there's a production issue, you might change these things manually in the AWS console, and you'll likely need to document it. The drift will indicate that you had ten instances, changed it to twenty, and now you need to revert it.
This is the drift. While this is a specific small change, it can be related to any service and any change. Urgent changes often cause drifts, as they require immediate action.
Andrew
This is also the last expectation here we get asked a lot, especially since CICD needs so many tooling and what scope does env0 have. So what can tell us more about this expectation? Does env0 replace everything?
Sergey
No, not really. I describe it in the part we talked about GitOps. So you can also see the pipeline itself. So basically env0 is integrated into all Git pipelines, you can see the pipeline itself showing you the whole like plan and apply, and all the steps that are happening in env0 itself.
So technically you can avoid going into the env0 system and just using your pipeline as-is for Git, for example. Also you can use the API and all of the Terraform modules of env0 API to integrate it with different other tools that you use in your company, for example IVP, and this is something that we plan for the future.
Andrew
So that's a good transition to talk about future plans. So you've been using env0 for a while now. Tell us about your journey and your future journey?
Sergey
A few things are important to me, and one is the governance aspect. I've talked about policy enforcement quite a lot today. Tools like OPA are something we are looking to incorporate. We're also expanding our developer portal for self-service via env0 for cloud resources. This answers a question from the audience about integrating env0 for developers. In our use case, we do not expose env0 directly to the developers; it operates in the background.
We are waiting for a few features that we hope will come soon. One is Infrastructure as Code coverage control and drift blame. So I can blame the engineers for being… No, I'm kidding. Drift blame is a term used to describe identifying who made changes in the console. For example, if someone changed the number of minimum instances from 10 to 20, I would want to know who did that.
So this is something that we all are eager to get as a feature in the future. And the last point here is talking about a maturity journey. We are not there yet. We’re using the GitOps approach and we have env0 managing and governing our environment. But we are still not at the self-service level. We still have work to do in componentization. You need to understand what you want to component inside and which features you want to introduce as a self-service. So we are working on those.
Andrew
Thank you for sharing your experience. Let's open up to open Q&A. There's a question about your separation of Argos CD. How do you separate resources that need to be automatically deployed by env0 and others that need to be deployed by ArgoCD?
Sergey
At this point we don't have a solution just yet. The way we are looking at it is that we will probably use the IDP and together with ArgoCD detection of new applications, right? So basically env0 will detect Terraform and ArgoCD will detect the application. Basically the same approach on both ends. So you will need to add the Terraform code and the Helm chart or manifest it.
And each of the systems will separately deploy the resources that are needed. So for us, the approach is to use IDP and a developer portal. A developer can trigger the process and there are two different stages and each stage whether the Helm chart or the cloud resources.
Andrew
I'll put in a chat a blog post where it shows env0 deploying. So you mentioned this, you deploy your Kubernetes cluster, but you need to bootstrap ArgosCD onto your company's cluster so you are using a Helm chart or Terraform deploying Helm chart, you can deploy ArgoCD’s operator into your Kubernetes cluster. Now, in order to configure ArgosCD to essentially listen to new applications, you can deploy the application YAML, which is the Argos CD’S CRD for managing applications.
You can use env0 to manage this application YAML as well. If you want to deploy Kubernetes manifests directly into your ArgoCD cluster, env0 will continue to listen for any Infrastructure as Code changes, such as updates to Terraform. All of this is managed in GitOps with ArgoCD. Because you've configured ArgoCD to listen, it will automatically respond to any image changes or manifest changes, managing its own processes accordingly.
So essentially GitOps on both sides, listening to their own hooks within Git.
Sergey
Yeah there's just another one about this topic is that you have application sets you know in ArgoCD you also have an approach which is called app of apps, one application that holds many applications. So for us it would be at least at this point, we are at the point where we have the app of apps deployed via Terraform and it is managing the whole other applications that also might be useful there.
Andrew
There is a question here. Have you used env0’s Environment Workflow? You answer the question, I’ll describe Workflow. So Workflow for our audience in env0 is essentially an orchestrator of multiple templates, it allows you to deploy a multilayer infrastructure. So say you have a three tier infrastructure network, compute and services.
You can keep each of these different services as separate Terraform templates or Terraform resources and env0 will orchestrate, make sure it deploys the right in the right order. So that's the high level of what env0 workflow is. The question is, are you using it? And if so, what has your experience been?
Sergey
Yeah, so I would say no. We've evaluated some use cases as we have before, but eventually we end up not using it at this point.
Andrew
Okay, let's look at another question here. I'm interested in order of magnitude, how many workspaces, how many of these are templated and auto discovered and how long do they actually live? So how many env0 environments or Terraform Workspaces, are you managing with env0? Roughly, yeah.
Sergey
So as I presented it before in the dashboard and then one of the slides, it’s about 900 of those. Most of those are environments that are pointing and discovered in some GitHub repository that we have that holds most of all the Terraform code and basically each folder eventually is an environment. Most of those are not templated.
But we have some use cases which are the templates that as I describe I gave one example, the Lambda functions. We have a lot of static websites that we deployed that use templates. And I think the other one that we consider using is with the self-service that we want to introduce. Eventually those will also be templated environments.
Andrew
Here's a question. We are planning to force all the development to to be deployed by a central code catalog that's maintained by a single infra team. Is this something env0 you're seeing more of, or are teams still able to develop Infrastructure as Code themselves? I guess I will answer that. So as you heard from Sergey's perspective right now, they are developing Infrastructure as Code themselves and slowly introducing templates to other team members.
At env0, working with dozens and dozens of customers, we see a broad spectrum. Some are completely siloed and command and control, I would say. So basically the central team is managing all of the resources and deploying all the resources, and then some are having a catalog approach. They create essentially a set of resources that their teams can onboard onto and then let the dev teams choose.
And then some are mixed in the sense that they have subject matter experts within each of the dev teams. So those people would like to write Infrastructure as Code, but then you also have the people on the other end of the spectrum who don't know any Infrastructure as Code and need a catalog in order to get access to resources.
So in env0 with templates, you can essentially create these quote unquote modules that will be able to give access to resources to these dev developers who aren't familiar with the infrastructure. At the same time, those modules you can publish in MS model registry for the developers who are familiar with infrastructure and want to compose their own resources, you can use what we described earlier with environment discovery and give them the ability to get access and deploy their own Infrastructure as Code through a PR process as well.
So we're definitely seeing the spectrum and it really kind of depends on the kind of people you have within your organization and the type of processes that you want to put in place. And env0 can really help cover both sides of that. And I hope that helps give you a sense of capabilities within env0. Maybe we'll take one more question from the audience.
For the questions we didn't get to cover, which are a bit more about env0 in general, maybe we'll segue into closing statements. So thank you again, Sergey, for taking this time to meet with us. Any last words about your experience with env0?
Sergey
I guess that a motto in life for me is taking things slow. And this is a process. I think we are in it for the last one and a half years and it will take more time.
Andrew
Absolutely, let's keep on making steps towards climbing that mountain. So thank you for taking your time. So I want to end on this one note, as since you're now here and you're probably interested in seeing how you can get started at env0, we have this concept of a one-day proof of value, so essentially we can help unlock and help you get an understanding of how env0 can build value for you.
It's an easy four-step process. We create an organization, add credentials, connect your Git repo, and start deploying resources in env0. And what we hope to achieve and can achieve within this one day POC is simply be able to show you the setup process, show you the cost estimation and start tracking cost and setup drift detection.
We've done this with a few of our customers already and immediately added value and we’ll show you what we can do. We'll start with a one-hour pre-onboarding session to get you started. Then, on that day, we'll go through the entire checklist and get you fully onboarded. We'll set up Slack or Teams support for easy communication and give you a 30-day trial of env0 so you can truly start experiencing the ROI.
After the 30 day trial, we expect you to reduce up to 45% of your cloud cost, increase your deployment speeds and decrease your time to merge. And again, all we need to do is create an organization, set up some cloud credentials, connect to whatever version control system you're using and deploy some Terraform or other Infrastructure as Code that you're using. If you're not ready for a one day POC just yet and you want to learn more, you can book a demo through our website, env0.com/demo-request or sign up for free trial on env0.com and check out env0.com for more docs and blogs and resources.
I want to thank everyone attending again today. If you have any more questions, please feel free to reach out and we'll be happy to address those on a future call or through chat. All right, So that concludes our webinar for today. Thank you again, Sergey, for joining us and hope to hear from everyone soon.
Cheers!