Join us for an exclusive webinar featuring a panel of experts from Western Union and VegaCloud as we explore the key intersection of Infrastructure as Code (IaC) management and FinOps.
Check out the video below to learn about:
- The relationship between FinOps and IaC management
- The crucial role of FinOps in modern cloud environments
- Practical ways to optimize cloud costs using FinOps and IaC tools
and more!
-----
Transcript
Luis de Silva (env0)
Hello, everyone, and welcome. Excited to have you with us today for the webinar on Mastering Cloud Efficiency - Scaling with FinOps and Infrastructure as Code. I'm Luis de Silva, Sales Leader here at env0, and the reason why we started this is because we understand that organizational power can fuel innovation, agility and growth.
Now we are often in control and will challenge scaling the infrastructure efficiently while controlling unpredictable costs. On one side, teams are tasked with deploying resources quickly, managing complex environments, and ensuring operational consistency. On the other, finance teams face the struggle of cost visibility, forecasting, and validating the ROI of cloud investments. Now, with infrastructure as a code, we saw this possible by automating and standardizing infrastructure management, allowing organizations to achieve agility and consistency at scale.
Meanwhile, FinOps brings financial governance to the forefront, ensuring that every cloud dollar is spent wisely, with a focus on accountability and cost optimization across teams.
However, scaling cloud infrastructure brings its own set of challenges—rising complexity, cost overruns, and the need for stronger collaboration between technical and financial teams. This is where the integration of IaC and FinOps becomes essential, providing a framework for both operational efficiency and financial sustainability as you grow.
In this session, we’ll dive into how these two critical practices—FinOps and IaC—can transform your cloud operations, maximizing efficiency and cost savings. We will have a question and answer at the end of the session to help you interact with our panelists. We also will provide a brief introduction of env0, which is our platform for streamline and manage infrastructure, scope, scale and Vega Cloud, which is the cloud management platform that helps your organization optimize the cloud infrastructure by providing visibility and cost management.
I'm honored to present you, Troy, the VP of Cloud Strategy, Architecture and Engineering from Western Union as our special guest today. Troy will share his journey of successfully implementing these practices at Western Union, including the challenges he faced and how he overcame them. I'm excited to hand that over to Troy.
Troy E. Lillehoff (Western Union)
Thanks Luis. as it were, doing it there. Thanks for joining. Like we said, my name's Troy Lillehof I’m the VP of IT at Western Union. With a large focus on cloud strategy and architecture. A little bit of background about Western Union. We're the global leader in cross-border money transfer and remittance. We're in about 200 plus countries and territories today with about 500,000 locations in our retail portfolio and about 150 million customers today.
Outside of our digital website and applications. We also have a banking and wall platform mostly in lock in the regions and we also have some other services like funds exchange, bill payments and a prepaid card business as well.
So really, when you think about the ability and need to serve global customers and that level and the infrastructure it takes to maintain them and also in a very highly regulated ecosystem with a lot of employees, a lot of applications, over 400 that we maintain with over 2000 developers in a lot of these teams.
A lot of the big sort of industry words come into play like compliance, risk, security, consistency, delivery on time and on budget. We'll focus on a couple of these things today and some of these challenges. And really these are challenges that are not only by us, but kind of from an industry perspective, especially companies who are going to the cloud and always looking for ways because you can spin things up so fast.
And as you know, they're an optics model, so they do cost money. So it's always finding ways where you can shift security compliance to the left as far as possible, because it just makes sense that you're able to get in front of these things before you provision them, because then you're sort of stuck behind the 8 ball in this continuous compliance sort of loophole.
It's better to be able to, you know, really the essence of what Devsecops is and policy is code is really to be able to integrate these things before you actually deploy the infrastructure, aligning a lot to the business side of the house and especially TCO is so important, especially when you're running into the cloud to know exactly what your application is, how much it should be costing based on how much revenue is generating, and then making sure that it doesn't continuously add in cost money, or at least your understanding that and making sure that you're bringing in good FinOps practices and you're understanding those costs ahead of the game before they go into the environment.
Also is to bring in the finance teams into the journey because they're actually the ones who are accumulating what those costs are in the environment and actually being part of that relationship before deploying a new application or infrastructure. And then it's also important because you can actually budget for things within the previous fiscal year, what's going into the next year. And at the granularity from our perspective, it's more of the business unit account workload or product level that sometimes changes.
So having the ability to, you know, change those demographics or account for things at a different level always really matches your tagging strategy and your metadata strategy at the end of the way. But to be able to account at that granular level from costs that are workload standpoint. So it really goes into what are the needs of an environment like that.
So really this comes down and this is, you know, a lot of products that are out there today. Probably many of you can think of them and own them. But really having a central pane of glass, not only from a visibility standpoint, but also a central control plane from a delivery standpoint, as more teams like higher performing teams and organizations, especially startups, where you don't have the traditional plan, build, run, model and a lot of engineering teams, you have more DevOps type teams.
So not only are they owning the application and delivery of the code, but they're also down to the level of managing the infrastructure and the deployment is down as well. I know from experience that typically developers don't have a cost hat on, they don't have a security hat on. So it's all back to this compliance and standardization. If you can ship those things left and you feel comfortable that you can hand over an infrastructure deployment to a developer and ensure that you're having these things at the front, now you can have that comfort level from an organization standpoint to really let these application and DevOps teams run as they should.
We talked a little bit about the granularity and something that we needed to get out of a tool. There was just not in there, especially with your native tools like ARM templates, CloudFormation templates and even certain aspects of Terraform Enterprise. There's more granularity that we needed and we could get out of this tool. Enhanced logging observability. We are highly regulated, so we do actually need to produce reports of what gets deployed in our infrastructure.
It makes sense if it's in a single location so that we can present those out of reports. And that's really even for us, it's a daily report because we want to know what got deployed and how it got deployed in the last 24 hours. And then really down to this GitOps model methodology. Most of my team and one of them is my cloud platform team is really at the point and really the requirement for me is, you know, automate as much as you can get as much to the teams who really should be owning and deploying it.
Sort of abstracting the tedious process of having to go into cloud AWS calculator and things like that. But to be able to sort of generate real time costs for application changes or modifications and then make sure you can run that against a sort budget or a digital budget, which we partner with on a FinOps perspective, and they'll get into that in a little bit, on making sure that what those changes the developers are making now, they're in compliance because they're using our teams modules that we've standardized and put governance constraints around and that they're using those, but also they're using them within the budget that's been allocated them or forecasted within the year or so. It really kind of creates this change between the business aspects, really what is being asked for.
We're really supposed to be produced and then automate that all the way through the deployment workflow in a governance model from a security and costing perspective. I think we also have some on-prem environments. We have data centers, smaller data centers around the globe. Like a lot of companies, this could be due to regulatory issues, data, resiliency, laws, things like that, or we just need to have a physical location or a cloud provider may or may not have a region in that particular location.
So knowing that we sort of have to have this footprint, once you kind of know what good looks like, you want to be able to bring that back into those practices that are really what's on prem today. So we take a lot of this pane of glass in this toolset and we apply that for things like five changes for firewalls, changes for routing changes, things like that. Pretty much all this code.
And then we're upskilling more. Some of those those on prem teams like security and network team, but really have got the benefit of abstracting a lot of the complexity because we really still own the modules, but just really give them the automation, the pipelines and make sure that we're controlling the granular access so that they can only deploy in the things that they need to.
So that was really our need for what we saw for. This is really what we got out of it. This is really being able to run infrastructure as code partnered with the business from a casting perspective and also from a compliance perspective because we're able to integrate our pipelines for infrastructure through env0 with OPA.
So now we can partner with the security teams and instead of them having to bring on so many technical resources, we just have them defining policy. And when you think of you're running a bank in the EU, you have GDPR, you have fedramp abysmal, you have your number of control frameworks like COMMIT ISO and really everything under the sun.
Now you can just build those in policy as code and make sure that the ones that apply for your infrastructure in this particular case are doing those checks before that actual deployment happens and that infrastructure is actually getting built. And then you're running the new increase or that workload cost generating a new cost and then have the ability to run it against the soft budget to say, yes, you're good to go or no, you need to go back to the well or update your business case and make sure that you get approving for these changes that you want to make.
And really, at the end of the day, what we're all trying to do is really free up the time for our smart people not getting bogged down into a lot of the operational things that they have traditionally done in the past. But now that you've shifted those to the development teams in an easy governance way, now we're starting to do things more like no ops and actually auto remediation, more things on the operational side of the house without having to staff up this large operational team and actually be able to query because you're tied in to a lot of pieces of data, not only this one, but also application logs and things like that where you're able to query for, you know, how many instances are in this particular account. You know, it just all depends on what APIs are able to connect to and that information you'd be able to generate. But now that you have that data, you're able to run that with a large language model that's certified and then actually ask questions from an operational standpoint.
And then now that you have that data, you can actually do something with that data. So now that's a further automation of being able to take some of these things and actually correct them, auto remediate them, report on them and just have those sorts of continuous improvement things. Now this is a lot outside of just what this toolset can do, but it just sort of exemplifies now that we can start going into a lot of this innovation mindset per se, because I've taken a lot of that time and actually shifted it to where it needs to be in a in a pretty mature model.
So I know it can get down to a lot of things on the business side of the house, and I'll leave that for the next guys to go through and definitely leave your questions for the end because if you have them, I'd like to answer them. Thanks for giving me the time today. I'm going to hand it over to Andrew.
Andrew Way (env0)
Hello, everyone. My name's Andrew. I'm the director of Sales Engineering. Thanks, Troy, for running us through your experience at Western Union. So before we begin, I kind of want to quickly share what env0 is. env0 is an infrastructure as code management tool, so we orchestrate all of your infrastructure related deployments. So whether it's Terraform, CloudFormation, Pulumi, Kubernetes, even, or Ansible, env0 helps manage that and layering governance as well as some features I'll share in a minute.
So to start the conversation about infrastructure as code and FinOps, I want to share with you this slide from the FinOps Foundation. Essentially, it's priorities based on cloud spend. And here you can see an overwhelming consistency regardless of how much money you're spending. They all want to reduce waste from unused resources or or reduce waste, reducing unused resources.
And then you also see some trends about getting accurate forecasting of spend, as well as organizational adoption of FinOps, which kind of transforms into FinOps, governance and policy at scale for the larger organizations or greater cloud spend. The reason I highlighted these are because at env0 we can help you track these things and in a very proactive manner.
So let's go into detail about how we do that, starting with cost estimation. One important part of shifting left this whole process and giving your developers more understanding of how much money they're going to spend. And as well as using that as a method of tracking your potential cloud costs is by adding cost estimation. Right now we're using a third party tool and we can easily integrate that tool with this process with others like Vega and other competitors to Vega.
So the idea here is we can build a pipeline. And as you're about to deploy new infrastructure, whether it's a new project or just developers spinning up dev test scenarios, we can start tracking how much money they're going to spend and use that as a way to determine whether or not something should move forward or not. Here you can see a screenshot where in a PR process we're giving you the cost estimation details so that you can know whether or not you should prove that PR.
The next feature we talk about is cost monitoring. So this kind of goes into the question since just asked and how do we track how much money you're spending in the cloud. env0 uses a proprietary tool called TerraTag. It's actually open sourced. And what we do there is we tag every resource before it gets deployed. And that way we can then query the billing APIs, how much money you're actually spending.
Later, we'll talk about how we can make this even more accurate and address some of the shortcomings of querying directly the AWS Billing API or Azure Billing API. But the idea here is now you can start associating resources with how much money you're actually spending, especially resources that are consumption based. Naturally, for VMs, you can easily estimate and use your enterprise price book and say, okay, yeah, my VMs can be standing up for 24 seven for the next month. I know how much that's going to cost. Sure. But things like Lambdas and S3 and network traffic, those are dependent on your consumption, right? So env0 can also help track those costs by taking that into account, we go to the next feature with cost budgeting.
In env0, we give you the ability to create these budget limits so that you can set notifications on when you're about to exceed those budget limits. Now this can really give you the ability to start putting governance in place through the infrastructure as code deploying process, because this is where your infrastructure is getting created rather than doing it behind. And after the fact, we can start taking advantage of this process and do it ahead of time and tracking your cost as well as you're doing your deployments.
So along with that, we can talk about future integrations that we're excited to be exploring with our partners like Vega. How do we drive dynamic budget limits? How do we use financial policies defined in the FinOps tool as part of the approval workflow? How do we help support chargeback models and make that as fine grained as possible with integrations with these third party tools, and then also align these deployment with your cloud cost strategies?
One last thing I want to share with you is a really exciting new beta feature called Cloud Compass. What it currently does is it allows you to track your infrastructure, it allows you to track your cloud resources and determine how it was created, whether it's through Click Ops, which means into the cloud console, whether it's the API or whether it's through IaC.
So through this process now, you can get a sense of how much of my resources in my cloud account were created through infrastructures code and whether or not I need to start migrating them into as infrastructure as code. And what we can then do in the future release is then start tracking how much money is being spent through Click Ops and Cloud API and how much do I really need to start bringing it in and tracking it through infrastructure code.
So I hope that was interesting. I want to pass it over to my colleague Zak and hope to have him share how he can help with this process as well.
Zak Brown (Vega Cloud)
Thank you so much, Andrew. So folks. My name is Zak Brown. I'm coming to you live from my hotel room in Sao Paulo to my Brazilian friends on the phone - bon dia. I'm the RVP of Solutions and Innovation at Vega Cloud.
And really what that means is that the coolest part of my job is working with really smart folks like Andrew and Troy. I get to partner with these really innovative companies like env0, and I'm excited to share with you some of the ways that we're really thinking about the future of both FinOps and IaC, especially at scale.
Now we see a ton of those questions in the chat. We're super excited to answer all of those in detail. We'll have a Q&A session just right after this. So we'll make sure that we give detailed answers for all of this information as well. But what I want to do is actually circle back to this relevant FinOps priorities slide that you just saw Andrew go over.
What we're going to do moving forward in this section is talk about a couple of these same areas, but with a slightly different perspective now. So we're going to start to focus a little bit more on real time costing as it relates to some of these different areas still within that realm of integrating your IaC and FinOps practices.
So what we'll be doing is three main points here and we'll have some really interesting use cases and integrations for each of these. So the first will be reducing waste and unused resources. And what I want you all to think about for something like this, for most of FinOps 101 is how to start saving money and reducing resources that are maybe underperforming or over provisioned or idle or waste that actually can be relatively simple.
So this is standard information you can get from all of the cloud providers themselves. If you have any sort of FinOps tool, even certain free tools, you'll get information into all of this and how to optimize all of these things. But what we want to do here with Vega and env0 is look to the future and start to ask some of these questions.
How do you continue optimization after things like the low hanging fruit are gone, or how do you build in optimization earlier into some of your DevOps processes? So the first thing that we'll be talking about is actually what Andrew was just mentioning a few minutes ago, and it's this idea of instant detection of instantiated resources.
So understanding of resources that are created through your IaC, through Click Ops, through the API, it gives you an instant picture of exactly where to target across a large organization. Now, by integrating through a FinOps platform such as Vega, we'll put the real time costs associated with those non IaC resources. So if you're a large company and your goals are using IaC to manage all of your infrastructure, it becomes really apparent exactly which teams, which application teams, which developers are spinning up resources and where they should be following the standard processes that your teams are setting.
And now we can assign those exact costs to them.
I know I saw a question in the chat about these costs. The cash cost or the amortized cost doesn't matter. It can be both. And in a good FinOps tool, you'll make sure that whenever you're doing this accurate costing information that you're tying to your IaC practices it.
It factors in things like your relevant discounts, you know, your EDP, private pricing agreements, all of that as well. So this instant detection of how resources are instantiated with env0 partnered with Vega costing provides you a really nice picture of where to start targeting waste. Now the next idea here comes from something Andrew mentioned a few minutes ago as well, and that's drift.
So coming from a more FinOps background, I hadn't heard enough about drift and it's such an exciting topic. So if you were using IaC across all of your environments, you have these standardized templates you'll be using. You have these configurations for your deployments you'll be using, and oftentimes you'll integrate with something like your GitHub to pull specific code for those configurations.
Well, what happens when those are out of date? What happens when one team is working with Oracle Cloud and has to use OpenTofu as one of those structures? And another team is on AWS and they're using Terraform? Oftentimes those configurations and deployments can get out of sync or what we call drift.
Now drift represents an area for cost savings. It represents an area for security risk. With a tool like env0, you instantly understand all if you are adrift across your entire environments, whether that's cloud or on prem as well. So it becomes this really, really interesting picture of certain areas to target. Now, when you run this through the integrations we have into something like Vega Cloud, and then you get the real time cost of all of your drifted resources.
So by providing you some of these insights into your non IaC resources and your drifted resources, you now have these different lenses to start optimizing in new ways that we really don't see being talked about in any other solutions. So it's not just a way to increase that optimization, it helps with security risks. And when you're using env0 across all of your environments, you're really getting to kind of the the idea of FinOps not being about saving money, but about making money by enabling all of your infrastructure to be more nimble, more agile, And any of those savings you're creating from this nonstandard waste reduction can be passed across the rest of the organization to fund innovation.
So there are some really interesting use cases here that we're super excited about. Now let's go to the next slide and we want to take some of those ideas about accurate forecasting and budgeting. And we're just going to take these a couple steps further. So the first thing here you just heard Andrew talking about how using tools like even InfraCost in your DevOps process can get you cost estimation in your IaC tool.
Well, what we're seeing with some organizations like Troy at Western Union is that when you have a FinOps tool with a budget assigned at a team level, you can actually query against that budget. You can see how much budget remains for the month or the week that your dev teams have, and you can turn that into an approval process with your deployments.
So any of those application teams that need to request all of these resources and are doing the deployments, they will instantly understand how much more they have. If they can do this, if they look at approval for this deployment and then as soon as they do those deployments, well, you've instantiated these cloud resources. You're now seeing the real time costs in your Vega Cloud, in your fit ups tool.
And now you can measure those real time costs against the budget that you already knew about from shifting this cost estimation left and now you can get budget variance information, which is so helpful. It means your forecasting is more accurate. It means if you over provisioned with these configurations, you can now adjust those and create a better suite of potential deployments for your DevOps processes.
It's such a positive feedback loop. It's really interesting, but this takes us to the second part here. Now in your FinOps tool, what you're going to need regardless of your cloud spend size, regardless of your organization size, you need to be accurate costing. And given everything that we've talked about today, that might even sound like a step backwards.
You might be saying. Zak That seems like that's the first thing, the very first step of all of this. Well, here's the truth right now, and only recently have we started hearing organizations talk about billing errors. So a billing error is what happens when one of those cloud infrastructure providers. So in an Amazon and Azure GCP, they don't do it maliciously.
They accidentally over bill now there are many reasons that billing errors can happen. Sometimes when discounts are stacked, a certain discount doesn't get applied in a certain area. Sometimes it deals with the actual billing APIs that those cloud providers use, and in certain regions, those billing APIs are not monitored the same way. In other regions. So if they go down and you can imagine how many billions of folks are being billed per second, it happens quite frequently.
Sometimes you get incorrect costing. So what you need to do with your FinOps tool is have a FinOps tool that can actually understand what you should be being charged from the cloud service provider and what you are now.
In a recent case, Uber at the most recent FinOps six 2024 presented on finding 6.7% of their spend. And I'm sure you all can imagine that Uber has a lot of cloud spend, 6.7% was billed incorrectly. So that means if your FinOps tool is able to validate and reconcile billing errors, you increase your forecast accuracy several percentage points. So between these two key areas, this better budgeting at a pre deployment level and being able to validate and reconcile billing errors, you are increasing your forecast accuracy and this is even giving you another positive externality where now when you're making commitments for things like your rise in your savings plans, you can commit more accurately because of this, more accurate forecasting.
So all of this to say all of this is IaC plus FinOps integration that we're doing, it helps in so many different ways and aligns and so many of those key FinOps priorities that we just don't see enough folks talking about.
Now, finally, our third point here, we want to talk about some potential efficiency gains at scale. So when we're talking about things like the future of FinOps, how do we get that kind of no ops type area? The answer is that whenever you're looking at either remediation, automating menial tasks, whatever it is that you're trying to scale up, you need to be factoring how your organization will be using GenAI. I know there are a whole host of various challenges and risks that come with this.
We won't get into that in too many details, but we wanted to just answer a couple questions and bring up a couple ideas for how we're seeing this innovation move forward. Now, Troy mentioned that what you're looking at, all of these different data sets together, can become really difficult to manage in one place. Your FinOps tool typically has cost and billing information, but then you have an APM tool, like a data dock or a new Relic.
You have application logs, you have code that you're getting from your GitHub or those processes coming through your Terraform templates and relating all of these in one place can be incredibly challenging for a human, but significantly more simple for the AI that we've been working on. So some of the cases we see are how we can integrate with something like env0 and all of a sudden query the in-house figure chat bot.
Hey, I'm noticing a spike in my cloud spend. Can you pull up my recent deployments and show me what could be causing this? Now this starts to relate to some of that resource instantiation information that you were getting from that Cloud Compass feature. And now you'll have really, really interesting information around, okay, this spike was from this team.
They deployed this configuration. It was approved by so-and-so. It was estimated to cost this. It actually cost 10% more. Here's why. So there are some really, really interesting future integrations when you're leveraging the power of some of these tools together. And that's why we are so excited at the cloud to be partnering with env0 . We're so excited to be speaking with folks like Troy at Western Union.
The future of FinOps & IaC, especially at scale for these large organizations, it's just so remarkably powerful and we love where everything is headed. So thank you very much. And we'll now move into the Q&A section.
Andrew Way (env0)
Okay. There's one question from the audience this specific to env0. So answer this one. Why are our env0 billing alarms better than AWS’s?
I think one way you can think about it is that env0 is a billing notification system built on a project. env0 has a concept of project, which is a group of resources. So rather than sending a bill alarm on a specific resource or an account level, you can set the billing on a very specific group of resources that are associated with a project or an application team. So now that makes it much more relevant and context aware as opposed to a blanket alarm on account level.
Troy E. Lillehoff (Western Union)
Yeah, and I would say other than alarms, because we've sort of been through this and been through that whole transition, AWS does not want you to save money. I know firsthand over years and years of going back and forth on some of those things that didn't look right and having to get billed that they do it because they have to do it just like all the other providers.
But it's basically a one narrow version. In a beta version, you don't get a lot of those capabilities that are built on top of that core report because essentially that's really what everybody's using is that long million line document called the core report of every spend down to the granular level of every time it builds. And then what you can do with that.
You also don't get the efficiencies just say for like an RTX resize, right? You may see from a CPU standpoint that I should resize RTX, but it's actually not taking other metrics like IO and memory into those recommendations. So if you would resize some of these things, you would actually potentially even have an outage because you may need that sizing, because it's not looking at all those additional granular metrics of what some of the other mature tools do in that realm, especially for some of the services where you would think you would need to look into additional metrics, they just don't have them out of the box or cloud native. So hopefully that gives you some more information.
Andrew Way (env0)
I think there's a question about egress costs.
Troy E. Lillehoff (Western Union)
So I mean, egress costs, as we know, are not specific to a particular workload and you can't tag them right? So you can't really accumulate them into a specific workload. Those are some of the resources where at least in our organization and I would say those sort of sit under the shared services bucket, if you will. If you want to be able to quantify those things, you can't do it in the cloud. So you would need to look at source and destination through other means like throughput. You know, Zak, I don't know how you guys accumulate that because that's not something that we look at, but there's a whole slew of resources which are sort of considered as shared, whether that be network type things, because it's just one too many for everybody. And if you need to get down to that granular level, it's the same issue for a shared database on prem, right? Trying to figure out I have ten applications sitting on a database and I have ten different tables and schemas. How do I offset that cost or build back potentially that business unit because it's all sitting on shared hardware.
The key would be, I guess is finding a way to be able to measure the source and destination in per the workload and then quantifying the amount of data that it's taking for that respective cost in that location whether it be you're looking at a transit gateway cost direct direct connect cost, whatever that is, and then doing some kind of math to gain whatever the percentage was of that particular traffic over some period of time. If you're doing monthly and then, you know, sort of charging that back to a specific cost center in the business unit. But I don't know of any native ways to do that. And, you know, we've just had to do some of the things ourselves. But typically those are not huge large costs compared to what the workloads are. And the problem that we're really trying to solve at the end of the day in a lot of organizations is just typically think of those as more shared service costs.
Same with any other network types like Internet, maybe provision circuits on prem. Typically they put those in a single cost center.
Andrew Way (env0)
How is your software? I imagine it will still be better than Azure GCP, but maybe we put it in context of the billing and FinOps. And why is that?
Zak Brown (Vega Cloud)
I would be very happy to take that one Andrew. When you're thinking especially from a financial perspective, about whether it's the shared services or shared costing, the one maybe tip, I could say, if you want to call it that, when you are putting this information, however you want it a million different ways that you can choose to to share this out when you're putting it in front of your users.
The most important thing is they understand how the cost is being shared to them. So if it's being shared by some sort of usage or utilization data, show them that data. you consumed, you know, X amount of compute spent from this one Kubernetes node, that's why you were getting this cost. Just make sure that that is in front of them.
So as you're scaling up your FinOps practice and more and more of your end users can see all of these costs, they know exactly what they're being charged for, right? Then why they're getting these shared costs and. Okay, let's go ahead. Now we can answer this question of how is your software better than Azure, ABC and GCP?
So yeah, Android, I do think we want to make sure we maybe ratchet this question down a little bit. Obviously, Azure and GCP are phenomenal cloud service providers and we will never come close to beating them at that. But what we find for most of these organizations is that what they really want you to do is commit as much spend as possible as soon as you possibly can, as much as time as possible on them.
That makes their lives easier. So whenever you have software coming from those cloud providers, that is how it is being targeted for those end user customers, Right? So if you were managing something like a FinOps practice and you know that AWS want you to spend as much as possible in AWS, however you probably have some Azure spend, some GCP spend, just from the standpoint of having a multi-cloud FinOps solution with accurate costing and billing error protection, you do get the full visibility that the cloud providers aren't incentivized for you to have.
So it becomes a really powerful way, even just that first layer of visibility before we even get to something like optimization, that becomes really powerful at an optimization level. Typically the cloud service providers have a series of optimization recommendations that they provide. They tend to be more entry level and they tend to not be customized for your organization.
So if you have specific application performance requirements, if you need CPU headroom when you're resizing an instance, whatever it is, if you have special requirements and most organizations do and they typically vary by team, that's not going to be built into any of those cost recommendations. And you want to make sure those are factors that you're considering at a FinOps level.
So I think that was a long winded answer. I please, I want to make sure we get to some other stuff too. But thank you for that one.
Andrew Way (env0)
Troy is hoping maybe you could help cover this question here. As organizations transition to focus on IaC, what strategies can you implement to manage and optimize cloud resource costs during this transformation?
Troy E. Lillehoff (Western Union)
Yeah, and I'm assuming it's more of a transformation from on prem to the cloud. I think the biggest thing is the way that you go into the cloud and making sure that you understand that, that you know, the cost and the optics cost and knowing that it's a different way to account for things is important to the company.
And there's enough data for companies who have either moved out of the cloud or had to instantiate something at a quick pace. And I would say FinOps is still a very reactive discipline. Unfortunately, where most people incorporate a high level because only when they know that they have a problem. It's a lot of the ways because developers typically are the ones taking them to the cloud and they're not doing it as a journey.
Cross-functionally with a lot of pieces of the organization. Obviously, if there's an enterprise architecture, making sure they're leading the way from a technical standardization, incorporating your security teams or cyber teams and making sure you're trying to build in, you know, policy as code or at least understanding what your current on prem policies are and regulations and how that really translates into the cloud or should translate into the cloud not as building cloud as another data center, but finding the best way to do it so that the teams who are really trying to get the benefits of the cloud are able to do that and not really slow them down.
From a cost perspective, this is really part of your SEO or some discipline where you want to bring in finance teams, accounting teams so that they understand the new billing model. So when they start seeing the bills from the organization perspective, they know what those are and then obviously tied to at least a version, one of a tagging strategy that's based on your organization.
So like I mentioned, whether that's, you know, at the business unit level, the application level, I would say some of the important tags on our version one was if you guys use a thing like a cost center that you're doing, you know, from an Oracle perspective, if you're doing, you know, how are you doing it from an accounting spec, if you can translate that into the cost that you can tie as metadata on the application and out name and app owner, support contact is always good. So that when you have operational issues, you know who that team is, hopefully it's a team that a person can be in there and then just a whole slew of automation, you know of tags you can use for automation for things like backup, different backup policies, whether it's lower environments, the hire environments.
We do things like power scheduling as well. So we have a huge number of developers who are in India in a center out of there. So when they go home for the day or the weekend, we can actually shut their environments down and have them spun back up when they get back into the office in their lower environments because nobody is deploying or developing anyways.
So you can essentially save, you know, almost half of your costs in your development environments if you can do things like that. And then, you know, it's really getting a good model making sure you have a cloud operating model in a strategy and then making sure you're bringing everybody along for the journey, you know, and letting them all have inputs and doing it cross-functionally.
We also run my organization in Architecture of You Board. So this is really where the product teams who are sort of innovating, determine what we're going to do for new sources of revenue, what new cool products. This is sort of where that mesh between product and tech comes. And then within our IRB, which is really enterprise architecture, cloud architecture, ISRA, cybersecurity in ops and finance, in a lot of those roles that I just mentioned to make sure that you're vetting these things out.
And then the last thing is, don't forget the data teams. If you don't have a robust data structure and strategy and architecture and then understanding really how much data you need to be able to store these things in, you know, as far as log data, application data, things like that. And you don't have good policies and lifecycles for them.
Those numbers and those costs are going to accumulate really quick if they're not coupled with, you know, rinsing that data, governing that data, as well as putting that data into cold storage, for example, a glacier in AWS in a lifecycle, because those are things you won't see right away, the things you're going to see a year from now when all of a sudden the database or S3 bucket goes from 500 gigs to ten terabytes, right?
And then you're trying to tackle those things on the back of figuring out why they're an issue. And if you definitely don't have a tool to help you recognize when these things come up and you're just depending on an alert to come up. Because now they see that threshold over a month, you're really sort of setting yourself up for a number of months of losses until you actually hit that alert.
So it's really trying to do the right things ahead of time based on what the industry has already seen, to make sure that, you know, not only to the technical teams, security teams and finance teams feel comfortable about that journey, but also probably the name of the CIO who's trying to get the company to the cloud and making sure that you have a robust strategy as well.
There's one more question that came up regarding cloud resources from shared. I think we talked about some of the shared costs regarding like circuits and network type costs.
Most FinOps tools have a way now as long as you have a similar way of tagging as far as you know, in a cluster, you know, if you're down, some actually do it down to the pod level now. So as long as you're doing metadata tagging similar to like you're doing tagging for your AWS resources, most of those FinOps tools will pull that out and be able to have that at that granular level and to a cloud spend, I would say just make sure that you understand the strategy and how you're deploying containerized workloads.
We've seen operational problems when you're mixing too many applications into the same cluster. There are benefits of that obviously, because you're not maintaining so many pieces of EC2 or whatever that infrastructure building in on is. But then you got to think of the downstream operational effects. When you're patching changes, you need to rehydrate images based on vulnerabilities and things like that and having to schedule with a lot of different teams potentially because you're touching the same piece of environment or infrastructure. So hopefully that helps.
Luis de Silva (env0)
Thank you, Troy Again, team, we cover a lot today. The takeaway is implementing infrastructure as code is crucial for scalability, and consistency, in your cloud environments. Adopting a FinOps strategy and showing your cloud investment is strategically aligned with your business goal and financial accountability into cooperation. Again, combining the both practices will give your organization the agility to innovate while maintaining control over costs and resources.
As next steps, we encourage you to schedule them within env0 to help you streamline infrastructure as code management at scale, or with Vega Cloud. You have the links on your screen today. Reach out to our team with any questions. We will be more than happy to answer them all, even to do a quick demo of either, I think to you again to Troy from Western Union, Zak from Vega and Andrew from env0.
We're sharing the experience today. And thank you all , all of you for your participation. This webinar has been recorded. You will receive an email with the webinar and the presentation.
Thank you!
Join us for an exclusive webinar featuring a panel of experts from Western Union and VegaCloud as we explore the key intersection of Infrastructure as Code (IaC) management and FinOps.
Check out the video below to learn about:
- The relationship between FinOps and IaC management
- The crucial role of FinOps in modern cloud environments
- Practical ways to optimize cloud costs using FinOps and IaC tools
and more!
-----
Transcript
Luis de Silva (env0)
Hello, everyone, and welcome. Excited to have you with us today for the webinar on Mastering Cloud Efficiency - Scaling with FinOps and Infrastructure as Code. I'm Luis de Silva, Sales Leader here at env0, and the reason why we started this is because we understand that organizational power can fuel innovation, agility and growth.
Now we are often in control and will challenge scaling the infrastructure efficiently while controlling unpredictable costs. On one side, teams are tasked with deploying resources quickly, managing complex environments, and ensuring operational consistency. On the other, finance teams face the struggle of cost visibility, forecasting, and validating the ROI of cloud investments. Now, with infrastructure as a code, we saw this possible by automating and standardizing infrastructure management, allowing organizations to achieve agility and consistency at scale.
Meanwhile, FinOps brings financial governance to the forefront, ensuring that every cloud dollar is spent wisely, with a focus on accountability and cost optimization across teams.
However, scaling cloud infrastructure brings its own set of challenges—rising complexity, cost overruns, and the need for stronger collaboration between technical and financial teams. This is where the integration of IaC and FinOps becomes essential, providing a framework for both operational efficiency and financial sustainability as you grow.
In this session, we’ll dive into how these two critical practices—FinOps and IaC—can transform your cloud operations, maximizing efficiency and cost savings. We will have a question and answer at the end of the session to help you interact with our panelists. We also will provide a brief introduction of env0, which is our platform for streamline and manage infrastructure, scope, scale and Vega Cloud, which is the cloud management platform that helps your organization optimize the cloud infrastructure by providing visibility and cost management.
I'm honored to present you, Troy, the VP of Cloud Strategy, Architecture and Engineering from Western Union as our special guest today. Troy will share his journey of successfully implementing these practices at Western Union, including the challenges he faced and how he overcame them. I'm excited to hand that over to Troy.
Troy E. Lillehoff (Western Union)
Thanks Luis. as it were, doing it there. Thanks for joining. Like we said, my name's Troy Lillehof I’m the VP of IT at Western Union. With a large focus on cloud strategy and architecture. A little bit of background about Western Union. We're the global leader in cross-border money transfer and remittance. We're in about 200 plus countries and territories today with about 500,000 locations in our retail portfolio and about 150 million customers today.
Outside of our digital website and applications. We also have a banking and wall platform mostly in lock in the regions and we also have some other services like funds exchange, bill payments and a prepaid card business as well.
So really, when you think about the ability and need to serve global customers and that level and the infrastructure it takes to maintain them and also in a very highly regulated ecosystem with a lot of employees, a lot of applications, over 400 that we maintain with over 2000 developers in a lot of these teams.
A lot of the big sort of industry words come into play like compliance, risk, security, consistency, delivery on time and on budget. We'll focus on a couple of these things today and some of these challenges. And really these are challenges that are not only by us, but kind of from an industry perspective, especially companies who are going to the cloud and always looking for ways because you can spin things up so fast.
And as you know, they're an optics model, so they do cost money. So it's always finding ways where you can shift security compliance to the left as far as possible, because it just makes sense that you're able to get in front of these things before you provision them, because then you're sort of stuck behind the 8 ball in this continuous compliance sort of loophole.
It's better to be able to, you know, really the essence of what Devsecops is and policy is code is really to be able to integrate these things before you actually deploy the infrastructure, aligning a lot to the business side of the house and especially TCO is so important, especially when you're running into the cloud to know exactly what your application is, how much it should be costing based on how much revenue is generating, and then making sure that it doesn't continuously add in cost money, or at least your understanding that and making sure that you're bringing in good FinOps practices and you're understanding those costs ahead of the game before they go into the environment.
Also is to bring in the finance teams into the journey because they're actually the ones who are accumulating what those costs are in the environment and actually being part of that relationship before deploying a new application or infrastructure. And then it's also important because you can actually budget for things within the previous fiscal year, what's going into the next year. And at the granularity from our perspective, it's more of the business unit account workload or product level that sometimes changes.
So having the ability to, you know, change those demographics or account for things at a different level always really matches your tagging strategy and your metadata strategy at the end of the way. But to be able to account at that granular level from costs that are workload standpoint. So it really goes into what are the needs of an environment like that.
So really this comes down and this is, you know, a lot of products that are out there today. Probably many of you can think of them and own them. But really having a central pane of glass, not only from a visibility standpoint, but also a central control plane from a delivery standpoint, as more teams like higher performing teams and organizations, especially startups, where you don't have the traditional plan, build, run, model and a lot of engineering teams, you have more DevOps type teams.
So not only are they owning the application and delivery of the code, but they're also down to the level of managing the infrastructure and the deployment is down as well. I know from experience that typically developers don't have a cost hat on, they don't have a security hat on. So it's all back to this compliance and standardization. If you can ship those things left and you feel comfortable that you can hand over an infrastructure deployment to a developer and ensure that you're having these things at the front, now you can have that comfort level from an organization standpoint to really let these application and DevOps teams run as they should.
We talked a little bit about the granularity and something that we needed to get out of a tool. There was just not in there, especially with your native tools like ARM templates, CloudFormation templates and even certain aspects of Terraform Enterprise. There's more granularity that we needed and we could get out of this tool. Enhanced logging observability. We are highly regulated, so we do actually need to produce reports of what gets deployed in our infrastructure.
It makes sense if it's in a single location so that we can present those out of reports. And that's really even for us, it's a daily report because we want to know what got deployed and how it got deployed in the last 24 hours. And then really down to this GitOps model methodology. Most of my team and one of them is my cloud platform team is really at the point and really the requirement for me is, you know, automate as much as you can get as much to the teams who really should be owning and deploying it.
Sort of abstracting the tedious process of having to go into cloud AWS calculator and things like that. But to be able to sort of generate real time costs for application changes or modifications and then make sure you can run that against a sort budget or a digital budget, which we partner with on a FinOps perspective, and they'll get into that in a little bit, on making sure that what those changes the developers are making now, they're in compliance because they're using our teams modules that we've standardized and put governance constraints around and that they're using those, but also they're using them within the budget that's been allocated them or forecasted within the year or so. It really kind of creates this change between the business aspects, really what is being asked for.
We're really supposed to be produced and then automate that all the way through the deployment workflow in a governance model from a security and costing perspective. I think we also have some on-prem environments. We have data centers, smaller data centers around the globe. Like a lot of companies, this could be due to regulatory issues, data, resiliency, laws, things like that, or we just need to have a physical location or a cloud provider may or may not have a region in that particular location.
So knowing that we sort of have to have this footprint, once you kind of know what good looks like, you want to be able to bring that back into those practices that are really what's on prem today. So we take a lot of this pane of glass in this toolset and we apply that for things like five changes for firewalls, changes for routing changes, things like that. Pretty much all this code.
And then we're upskilling more. Some of those those on prem teams like security and network team, but really have got the benefit of abstracting a lot of the complexity because we really still own the modules, but just really give them the automation, the pipelines and make sure that we're controlling the granular access so that they can only deploy in the things that they need to.
So that was really our need for what we saw for. This is really what we got out of it. This is really being able to run infrastructure as code partnered with the business from a casting perspective and also from a compliance perspective because we're able to integrate our pipelines for infrastructure through env0 with OPA.
So now we can partner with the security teams and instead of them having to bring on so many technical resources, we just have them defining policy. And when you think of you're running a bank in the EU, you have GDPR, you have fedramp abysmal, you have your number of control frameworks like COMMIT ISO and really everything under the sun.
Now you can just build those in policy as code and make sure that the ones that apply for your infrastructure in this particular case are doing those checks before that actual deployment happens and that infrastructure is actually getting built. And then you're running the new increase or that workload cost generating a new cost and then have the ability to run it against the soft budget to say, yes, you're good to go or no, you need to go back to the well or update your business case and make sure that you get approving for these changes that you want to make.
And really, at the end of the day, what we're all trying to do is really free up the time for our smart people not getting bogged down into a lot of the operational things that they have traditionally done in the past. But now that you've shifted those to the development teams in an easy governance way, now we're starting to do things more like no ops and actually auto remediation, more things on the operational side of the house without having to staff up this large operational team and actually be able to query because you're tied in to a lot of pieces of data, not only this one, but also application logs and things like that where you're able to query for, you know, how many instances are in this particular account. You know, it just all depends on what APIs are able to connect to and that information you'd be able to generate. But now that you have that data, you're able to run that with a large language model that's certified and then actually ask questions from an operational standpoint.
And then now that you have that data, you can actually do something with that data. So now that's a further automation of being able to take some of these things and actually correct them, auto remediate them, report on them and just have those sorts of continuous improvement things. Now this is a lot outside of just what this toolset can do, but it just sort of exemplifies now that we can start going into a lot of this innovation mindset per se, because I've taken a lot of that time and actually shifted it to where it needs to be in a in a pretty mature model.
So I know it can get down to a lot of things on the business side of the house, and I'll leave that for the next guys to go through and definitely leave your questions for the end because if you have them, I'd like to answer them. Thanks for giving me the time today. I'm going to hand it over to Andrew.
Andrew Way (env0)
Hello, everyone. My name's Andrew. I'm the director of Sales Engineering. Thanks, Troy, for running us through your experience at Western Union. So before we begin, I kind of want to quickly share what env0 is. env0 is an infrastructure as code management tool, so we orchestrate all of your infrastructure related deployments. So whether it's Terraform, CloudFormation, Pulumi, Kubernetes, even, or Ansible, env0 helps manage that and layering governance as well as some features I'll share in a minute.
So to start the conversation about infrastructure as code and FinOps, I want to share with you this slide from the FinOps Foundation. Essentially, it's priorities based on cloud spend. And here you can see an overwhelming consistency regardless of how much money you're spending. They all want to reduce waste from unused resources or or reduce waste, reducing unused resources.
And then you also see some trends about getting accurate forecasting of spend, as well as organizational adoption of FinOps, which kind of transforms into FinOps, governance and policy at scale for the larger organizations or greater cloud spend. The reason I highlighted these are because at env0 we can help you track these things and in a very proactive manner.
So let's go into detail about how we do that, starting with cost estimation. One important part of shifting left this whole process and giving your developers more understanding of how much money they're going to spend. And as well as using that as a method of tracking your potential cloud costs is by adding cost estimation. Right now we're using a third party tool and we can easily integrate that tool with this process with others like Vega and other competitors to Vega.
So the idea here is we can build a pipeline. And as you're about to deploy new infrastructure, whether it's a new project or just developers spinning up dev test scenarios, we can start tracking how much money they're going to spend and use that as a way to determine whether or not something should move forward or not. Here you can see a screenshot where in a PR process we're giving you the cost estimation details so that you can know whether or not you should prove that PR.
The next feature we talk about is cost monitoring. So this kind of goes into the question since just asked and how do we track how much money you're spending in the cloud. env0 uses a proprietary tool called TerraTag. It's actually open sourced. And what we do there is we tag every resource before it gets deployed. And that way we can then query the billing APIs, how much money you're actually spending.
Later, we'll talk about how we can make this even more accurate and address some of the shortcomings of querying directly the AWS Billing API or Azure Billing API. But the idea here is now you can start associating resources with how much money you're actually spending, especially resources that are consumption based. Naturally, for VMs, you can easily estimate and use your enterprise price book and say, okay, yeah, my VMs can be standing up for 24 seven for the next month. I know how much that's going to cost. Sure. But things like Lambdas and S3 and network traffic, those are dependent on your consumption, right? So env0 can also help track those costs by taking that into account, we go to the next feature with cost budgeting.
In env0, we give you the ability to create these budget limits so that you can set notifications on when you're about to exceed those budget limits. Now this can really give you the ability to start putting governance in place through the infrastructure as code deploying process, because this is where your infrastructure is getting created rather than doing it behind. And after the fact, we can start taking advantage of this process and do it ahead of time and tracking your cost as well as you're doing your deployments.
So along with that, we can talk about future integrations that we're excited to be exploring with our partners like Vega. How do we drive dynamic budget limits? How do we use financial policies defined in the FinOps tool as part of the approval workflow? How do we help support chargeback models and make that as fine grained as possible with integrations with these third party tools, and then also align these deployment with your cloud cost strategies?
One last thing I want to share with you is a really exciting new beta feature called Cloud Compass. What it currently does is it allows you to track your infrastructure, it allows you to track your cloud resources and determine how it was created, whether it's through Click Ops, which means into the cloud console, whether it's the API or whether it's through IaC.
So through this process now, you can get a sense of how much of my resources in my cloud account were created through infrastructures code and whether or not I need to start migrating them into as infrastructure as code. And what we can then do in the future release is then start tracking how much money is being spent through Click Ops and Cloud API and how much do I really need to start bringing it in and tracking it through infrastructure code.
So I hope that was interesting. I want to pass it over to my colleague Zak and hope to have him share how he can help with this process as well.
Zak Brown (Vega Cloud)
Thank you so much, Andrew. So folks. My name is Zak Brown. I'm coming to you live from my hotel room in Sao Paulo to my Brazilian friends on the phone - bon dia. I'm the RVP of Solutions and Innovation at Vega Cloud.
And really what that means is that the coolest part of my job is working with really smart folks like Andrew and Troy. I get to partner with these really innovative companies like env0, and I'm excited to share with you some of the ways that we're really thinking about the future of both FinOps and IaC, especially at scale.
Now we see a ton of those questions in the chat. We're super excited to answer all of those in detail. We'll have a Q&A session just right after this. So we'll make sure that we give detailed answers for all of this information as well. But what I want to do is actually circle back to this relevant FinOps priorities slide that you just saw Andrew go over.
What we're going to do moving forward in this section is talk about a couple of these same areas, but with a slightly different perspective now. So we're going to start to focus a little bit more on real time costing as it relates to some of these different areas still within that realm of integrating your IaC and FinOps practices.
So what we'll be doing is three main points here and we'll have some really interesting use cases and integrations for each of these. So the first will be reducing waste and unused resources. And what I want you all to think about for something like this, for most of FinOps 101 is how to start saving money and reducing resources that are maybe underperforming or over provisioned or idle or waste that actually can be relatively simple.
So this is standard information you can get from all of the cloud providers themselves. If you have any sort of FinOps tool, even certain free tools, you'll get information into all of this and how to optimize all of these things. But what we want to do here with Vega and env0 is look to the future and start to ask some of these questions.
How do you continue optimization after things like the low hanging fruit are gone, or how do you build in optimization earlier into some of your DevOps processes? So the first thing that we'll be talking about is actually what Andrew was just mentioning a few minutes ago, and it's this idea of instant detection of instantiated resources.
So understanding of resources that are created through your IaC, through Click Ops, through the API, it gives you an instant picture of exactly where to target across a large organization. Now, by integrating through a FinOps platform such as Vega, we'll put the real time costs associated with those non IaC resources. So if you're a large company and your goals are using IaC to manage all of your infrastructure, it becomes really apparent exactly which teams, which application teams, which developers are spinning up resources and where they should be following the standard processes that your teams are setting.
And now we can assign those exact costs to them.
I know I saw a question in the chat about these costs. The cash cost or the amortized cost doesn't matter. It can be both. And in a good FinOps tool, you'll make sure that whenever you're doing this accurate costing information that you're tying to your IaC practices it.
It factors in things like your relevant discounts, you know, your EDP, private pricing agreements, all of that as well. So this instant detection of how resources are instantiated with env0 partnered with Vega costing provides you a really nice picture of where to start targeting waste. Now the next idea here comes from something Andrew mentioned a few minutes ago as well, and that's drift.
So coming from a more FinOps background, I hadn't heard enough about drift and it's such an exciting topic. So if you were using IaC across all of your environments, you have these standardized templates you'll be using. You have these configurations for your deployments you'll be using, and oftentimes you'll integrate with something like your GitHub to pull specific code for those configurations.
Well, what happens when those are out of date? What happens when one team is working with Oracle Cloud and has to use OpenTofu as one of those structures? And another team is on AWS and they're using Terraform? Oftentimes those configurations and deployments can get out of sync or what we call drift.
Now drift represents an area for cost savings. It represents an area for security risk. With a tool like env0, you instantly understand all if you are adrift across your entire environments, whether that's cloud or on prem as well. So it becomes this really, really interesting picture of certain areas to target. Now, when you run this through the integrations we have into something like Vega Cloud, and then you get the real time cost of all of your drifted resources.
So by providing you some of these insights into your non IaC resources and your drifted resources, you now have these different lenses to start optimizing in new ways that we really don't see being talked about in any other solutions. So it's not just a way to increase that optimization, it helps with security risks. And when you're using env0 across all of your environments, you're really getting to kind of the the idea of FinOps not being about saving money, but about making money by enabling all of your infrastructure to be more nimble, more agile, And any of those savings you're creating from this nonstandard waste reduction can be passed across the rest of the organization to fund innovation.
So there are some really interesting use cases here that we're super excited about. Now let's go to the next slide and we want to take some of those ideas about accurate forecasting and budgeting. And we're just going to take these a couple steps further. So the first thing here you just heard Andrew talking about how using tools like even InfraCost in your DevOps process can get you cost estimation in your IaC tool.
Well, what we're seeing with some organizations like Troy at Western Union is that when you have a FinOps tool with a budget assigned at a team level, you can actually query against that budget. You can see how much budget remains for the month or the week that your dev teams have, and you can turn that into an approval process with your deployments.
So any of those application teams that need to request all of these resources and are doing the deployments, they will instantly understand how much more they have. If they can do this, if they look at approval for this deployment and then as soon as they do those deployments, well, you've instantiated these cloud resources. You're now seeing the real time costs in your Vega Cloud, in your fit ups tool.
And now you can measure those real time costs against the budget that you already knew about from shifting this cost estimation left and now you can get budget variance information, which is so helpful. It means your forecasting is more accurate. It means if you over provisioned with these configurations, you can now adjust those and create a better suite of potential deployments for your DevOps processes.
It's such a positive feedback loop. It's really interesting, but this takes us to the second part here. Now in your FinOps tool, what you're going to need regardless of your cloud spend size, regardless of your organization size, you need to be accurate costing. And given everything that we've talked about today, that might even sound like a step backwards.
You might be saying. Zak That seems like that's the first thing, the very first step of all of this. Well, here's the truth right now, and only recently have we started hearing organizations talk about billing errors. So a billing error is what happens when one of those cloud infrastructure providers. So in an Amazon and Azure GCP, they don't do it maliciously.
They accidentally over bill now there are many reasons that billing errors can happen. Sometimes when discounts are stacked, a certain discount doesn't get applied in a certain area. Sometimes it deals with the actual billing APIs that those cloud providers use, and in certain regions, those billing APIs are not monitored the same way. In other regions. So if they go down and you can imagine how many billions of folks are being billed per second, it happens quite frequently.
Sometimes you get incorrect costing. So what you need to do with your FinOps tool is have a FinOps tool that can actually understand what you should be being charged from the cloud service provider and what you are now.
In a recent case, Uber at the most recent FinOps six 2024 presented on finding 6.7% of their spend. And I'm sure you all can imagine that Uber has a lot of cloud spend, 6.7% was billed incorrectly. So that means if your FinOps tool is able to validate and reconcile billing errors, you increase your forecast accuracy several percentage points. So between these two key areas, this better budgeting at a pre deployment level and being able to validate and reconcile billing errors, you are increasing your forecast accuracy and this is even giving you another positive externality where now when you're making commitments for things like your rise in your savings plans, you can commit more accurately because of this, more accurate forecasting.
So all of this to say all of this is IaC plus FinOps integration that we're doing, it helps in so many different ways and aligns and so many of those key FinOps priorities that we just don't see enough folks talking about.
Now, finally, our third point here, we want to talk about some potential efficiency gains at scale. So when we're talking about things like the future of FinOps, how do we get that kind of no ops type area? The answer is that whenever you're looking at either remediation, automating menial tasks, whatever it is that you're trying to scale up, you need to be factoring how your organization will be using GenAI. I know there are a whole host of various challenges and risks that come with this.
We won't get into that in too many details, but we wanted to just answer a couple questions and bring up a couple ideas for how we're seeing this innovation move forward. Now, Troy mentioned that what you're looking at, all of these different data sets together, can become really difficult to manage in one place. Your FinOps tool typically has cost and billing information, but then you have an APM tool, like a data dock or a new Relic.
You have application logs, you have code that you're getting from your GitHub or those processes coming through your Terraform templates and relating all of these in one place can be incredibly challenging for a human, but significantly more simple for the AI that we've been working on. So some of the cases we see are how we can integrate with something like env0 and all of a sudden query the in-house figure chat bot.
Hey, I'm noticing a spike in my cloud spend. Can you pull up my recent deployments and show me what could be causing this? Now this starts to relate to some of that resource instantiation information that you were getting from that Cloud Compass feature. And now you'll have really, really interesting information around, okay, this spike was from this team.
They deployed this configuration. It was approved by so-and-so. It was estimated to cost this. It actually cost 10% more. Here's why. So there are some really, really interesting future integrations when you're leveraging the power of some of these tools together. And that's why we are so excited at the cloud to be partnering with env0 . We're so excited to be speaking with folks like Troy at Western Union.
The future of FinOps & IaC, especially at scale for these large organizations, it's just so remarkably powerful and we love where everything is headed. So thank you very much. And we'll now move into the Q&A section.
Andrew Way (env0)
Okay. There's one question from the audience this specific to env0. So answer this one. Why are our env0 billing alarms better than AWS’s?
I think one way you can think about it is that env0 is a billing notification system built on a project. env0 has a concept of project, which is a group of resources. So rather than sending a bill alarm on a specific resource or an account level, you can set the billing on a very specific group of resources that are associated with a project or an application team. So now that makes it much more relevant and context aware as opposed to a blanket alarm on account level.
Troy E. Lillehoff (Western Union)
Yeah, and I would say other than alarms, because we've sort of been through this and been through that whole transition, AWS does not want you to save money. I know firsthand over years and years of going back and forth on some of those things that didn't look right and having to get billed that they do it because they have to do it just like all the other providers.
But it's basically a one narrow version. In a beta version, you don't get a lot of those capabilities that are built on top of that core report because essentially that's really what everybody's using is that long million line document called the core report of every spend down to the granular level of every time it builds. And then what you can do with that.
You also don't get the efficiencies just say for like an RTX resize, right? You may see from a CPU standpoint that I should resize RTX, but it's actually not taking other metrics like IO and memory into those recommendations. So if you would resize some of these things, you would actually potentially even have an outage because you may need that sizing, because it's not looking at all those additional granular metrics of what some of the other mature tools do in that realm, especially for some of the services where you would think you would need to look into additional metrics, they just don't have them out of the box or cloud native. So hopefully that gives you some more information.
Andrew Way (env0)
I think there's a question about egress costs.
Troy E. Lillehoff (Western Union)
So I mean, egress costs, as we know, are not specific to a particular workload and you can't tag them right? So you can't really accumulate them into a specific workload. Those are some of the resources where at least in our organization and I would say those sort of sit under the shared services bucket, if you will. If you want to be able to quantify those things, you can't do it in the cloud. So you would need to look at source and destination through other means like throughput. You know, Zak, I don't know how you guys accumulate that because that's not something that we look at, but there's a whole slew of resources which are sort of considered as shared, whether that be network type things, because it's just one too many for everybody. And if you need to get down to that granular level, it's the same issue for a shared database on prem, right? Trying to figure out I have ten applications sitting on a database and I have ten different tables and schemas. How do I offset that cost or build back potentially that business unit because it's all sitting on shared hardware.
The key would be, I guess is finding a way to be able to measure the source and destination in per the workload and then quantifying the amount of data that it's taking for that respective cost in that location whether it be you're looking at a transit gateway cost direct direct connect cost, whatever that is, and then doing some kind of math to gain whatever the percentage was of that particular traffic over some period of time. If you're doing monthly and then, you know, sort of charging that back to a specific cost center in the business unit. But I don't know of any native ways to do that. And, you know, we've just had to do some of the things ourselves. But typically those are not huge large costs compared to what the workloads are. And the problem that we're really trying to solve at the end of the day in a lot of organizations is just typically think of those as more shared service costs.
Same with any other network types like Internet, maybe provision circuits on prem. Typically they put those in a single cost center.
Andrew Way (env0)
How is your software? I imagine it will still be better than Azure GCP, but maybe we put it in context of the billing and FinOps. And why is that?
Zak Brown (Vega Cloud)
I would be very happy to take that one Andrew. When you're thinking especially from a financial perspective, about whether it's the shared services or shared costing, the one maybe tip, I could say, if you want to call it that, when you are putting this information, however you want it a million different ways that you can choose to to share this out when you're putting it in front of your users.
The most important thing is they understand how the cost is being shared to them. So if it's being shared by some sort of usage or utilization data, show them that data. you consumed, you know, X amount of compute spent from this one Kubernetes node, that's why you were getting this cost. Just make sure that that is in front of them.
So as you're scaling up your FinOps practice and more and more of your end users can see all of these costs, they know exactly what they're being charged for, right? Then why they're getting these shared costs and. Okay, let's go ahead. Now we can answer this question of how is your software better than Azure, ABC and GCP?
So yeah, Android, I do think we want to make sure we maybe ratchet this question down a little bit. Obviously, Azure and GCP are phenomenal cloud service providers and we will never come close to beating them at that. But what we find for most of these organizations is that what they really want you to do is commit as much spend as possible as soon as you possibly can, as much as time as possible on them.
That makes their lives easier. So whenever you have software coming from those cloud providers, that is how it is being targeted for those end user customers, Right? So if you were managing something like a FinOps practice and you know that AWS want you to spend as much as possible in AWS, however you probably have some Azure spend, some GCP spend, just from the standpoint of having a multi-cloud FinOps solution with accurate costing and billing error protection, you do get the full visibility that the cloud providers aren't incentivized for you to have.
So it becomes a really powerful way, even just that first layer of visibility before we even get to something like optimization, that becomes really powerful at an optimization level. Typically the cloud service providers have a series of optimization recommendations that they provide. They tend to be more entry level and they tend to not be customized for your organization.
So if you have specific application performance requirements, if you need CPU headroom when you're resizing an instance, whatever it is, if you have special requirements and most organizations do and they typically vary by team, that's not going to be built into any of those cost recommendations. And you want to make sure those are factors that you're considering at a FinOps level.
So I think that was a long winded answer. I please, I want to make sure we get to some other stuff too. But thank you for that one.
Andrew Way (env0)
Troy is hoping maybe you could help cover this question here. As organizations transition to focus on IaC, what strategies can you implement to manage and optimize cloud resource costs during this transformation?
Troy E. Lillehoff (Western Union)
Yeah, and I'm assuming it's more of a transformation from on prem to the cloud. I think the biggest thing is the way that you go into the cloud and making sure that you understand that, that you know, the cost and the optics cost and knowing that it's a different way to account for things is important to the company.
And there's enough data for companies who have either moved out of the cloud or had to instantiate something at a quick pace. And I would say FinOps is still a very reactive discipline. Unfortunately, where most people incorporate a high level because only when they know that they have a problem. It's a lot of the ways because developers typically are the ones taking them to the cloud and they're not doing it as a journey.
Cross-functionally with a lot of pieces of the organization. Obviously, if there's an enterprise architecture, making sure they're leading the way from a technical standardization, incorporating your security teams or cyber teams and making sure you're trying to build in, you know, policy as code or at least understanding what your current on prem policies are and regulations and how that really translates into the cloud or should translate into the cloud not as building cloud as another data center, but finding the best way to do it so that the teams who are really trying to get the benefits of the cloud are able to do that and not really slow them down.
From a cost perspective, this is really part of your SEO or some discipline where you want to bring in finance teams, accounting teams so that they understand the new billing model. So when they start seeing the bills from the organization perspective, they know what those are and then obviously tied to at least a version, one of a tagging strategy that's based on your organization.
So like I mentioned, whether that's, you know, at the business unit level, the application level, I would say some of the important tags on our version one was if you guys use a thing like a cost center that you're doing, you know, from an Oracle perspective, if you're doing, you know, how are you doing it from an accounting spec, if you can translate that into the cost that you can tie as metadata on the application and out name and app owner, support contact is always good. So that when you have operational issues, you know who that team is, hopefully it's a team that a person can be in there and then just a whole slew of automation, you know of tags you can use for automation for things like backup, different backup policies, whether it's lower environments, the hire environments.
We do things like power scheduling as well. So we have a huge number of developers who are in India in a center out of there. So when they go home for the day or the weekend, we can actually shut their environments down and have them spun back up when they get back into the office in their lower environments because nobody is deploying or developing anyways.
So you can essentially save, you know, almost half of your costs in your development environments if you can do things like that. And then, you know, it's really getting a good model making sure you have a cloud operating model in a strategy and then making sure you're bringing everybody along for the journey, you know, and letting them all have inputs and doing it cross-functionally.
We also run my organization in Architecture of You Board. So this is really where the product teams who are sort of innovating, determine what we're going to do for new sources of revenue, what new cool products. This is sort of where that mesh between product and tech comes. And then within our IRB, which is really enterprise architecture, cloud architecture, ISRA, cybersecurity in ops and finance, in a lot of those roles that I just mentioned to make sure that you're vetting these things out.
And then the last thing is, don't forget the data teams. If you don't have a robust data structure and strategy and architecture and then understanding really how much data you need to be able to store these things in, you know, as far as log data, application data, things like that. And you don't have good policies and lifecycles for them.
Those numbers and those costs are going to accumulate really quick if they're not coupled with, you know, rinsing that data, governing that data, as well as putting that data into cold storage, for example, a glacier in AWS in a lifecycle, because those are things you won't see right away, the things you're going to see a year from now when all of a sudden the database or S3 bucket goes from 500 gigs to ten terabytes, right?
And then you're trying to tackle those things on the back of figuring out why they're an issue. And if you definitely don't have a tool to help you recognize when these things come up and you're just depending on an alert to come up. Because now they see that threshold over a month, you're really sort of setting yourself up for a number of months of losses until you actually hit that alert.
So it's really trying to do the right things ahead of time based on what the industry has already seen, to make sure that, you know, not only to the technical teams, security teams and finance teams feel comfortable about that journey, but also probably the name of the CIO who's trying to get the company to the cloud and making sure that you have a robust strategy as well.
There's one more question that came up regarding cloud resources from shared. I think we talked about some of the shared costs regarding like circuits and network type costs.
Most FinOps tools have a way now as long as you have a similar way of tagging as far as you know, in a cluster, you know, if you're down, some actually do it down to the pod level now. So as long as you're doing metadata tagging similar to like you're doing tagging for your AWS resources, most of those FinOps tools will pull that out and be able to have that at that granular level and to a cloud spend, I would say just make sure that you understand the strategy and how you're deploying containerized workloads.
We've seen operational problems when you're mixing too many applications into the same cluster. There are benefits of that obviously, because you're not maintaining so many pieces of EC2 or whatever that infrastructure building in on is. But then you got to think of the downstream operational effects. When you're patching changes, you need to rehydrate images based on vulnerabilities and things like that and having to schedule with a lot of different teams potentially because you're touching the same piece of environment or infrastructure. So hopefully that helps.
Luis de Silva (env0)
Thank you, Troy Again, team, we cover a lot today. The takeaway is implementing infrastructure as code is crucial for scalability, and consistency, in your cloud environments. Adopting a FinOps strategy and showing your cloud investment is strategically aligned with your business goal and financial accountability into cooperation. Again, combining the both practices will give your organization the agility to innovate while maintaining control over costs and resources.
As next steps, we encourage you to schedule them within env0 to help you streamline infrastructure as code management at scale, or with Vega Cloud. You have the links on your screen today. Reach out to our team with any questions. We will be more than happy to answer them all, even to do a quick demo of either, I think to you again to Troy from Western Union, Zak from Vega and Andrew from env0.
We're sharing the experience today. And thank you all , all of you for your participation. This webinar has been recorded. You will receive an email with the webinar and the presentation.
Thank you!