In this blog post we’ll walk you through setting up your first monitoring and observability system to gather data about your systems, deployed in AWS cloud using env0.
When you’re deploying any software environment or infrastructure, whether it’s containerized or not, you must think about your application monitoring and observability strategy. This not only saves developers from being woken up at 2 a.m. by an alert noise to fix an issue, but it also gives you the ability to truly look into your application performance.
In this blog post, you’ll learn about what monitoring and observability are, one of the most popular stacks, and how to deploy it at scale into AWS using env0.
What are monitoring and observability?
Monitoring and observability are often referred to as if they’re the same thing, but they have two very different purposes for developers and engineering teams.
This topic in itself can be an entire book, so I’ve tried to keep this section brief, but with enough information to provide context on the difference between their function and technologies.
Monitoring
Monitoring is all about observing data in real time, or tracking history of data. For example, if you’ve ever walked into a Network Operations Center (NOC) or walked over to the IT space in your organization, you may have seen some big screens with data statistics and dashboards.
Monitoring collects telemetry data that gives real time visibility and the ability to look into application performance. These tools collect data on how each component such as CPU, memory, bandwidth, and system performance at a high level. These software tools are often used by developers to measure software performance issues and optimize incident response times.
Observability
What makes observability important is that it deals with the "unknown unknowns," providing visibility into the entire application, and allowing developers to synthesize that raw data and form actionable insights to drive business outcomes.
Observability addresses the "three pillars" of data, then goes beyond to provide value to customers:
- Traces: Track application availability and health from an end-to-end perspective
- Metrics: Collect time series data, which is used to manage, optimize, and predict expected application and system performance, both good and bad
- Logging: Stores logged events written by apps, users, and systems. As all engineers know, logs are a lifesaver when troubleshooting
Beyond raw data
In short—with monitoring, you gather real time data about each system, component, service, events, infrastructure, etc. But despite the growing body of data under our control, in any complex system, blind spots will always remain.
Observability goes beyond what's monitored and under control. The telemetry data collected (from logs, metrics, and traces) provides context that leads to insights that drive business value for customers. Observability == inference.
What are Prometheus and Grafana?
Now that you know the theory behind the collection of data and hypothesis inference about your systems, you may be wondering: What tools and platforms are available to help developers collect data and hypothesize inferences about their service?
There's a greater variety than users might expect. A few notable ones are:
- Datadog
- AppDynamics
- AWS CloudWatch
- Azure Monitor
- … and a whole host of other amazing cloud native tools and open source solutions
In the cloud native, Kubernetes, and containerization world, a lot of users gravitate towards one specific stack: Prometheus and Grafana.
Although this stack can be used to manage workloads outside of Kubernetes clusters, they both have a ton of compatibility and support for containerized applications at scale. They’re both an open source tool / software which gives developers a ton of control. It also means you don’t have to pay for them, which is a great value for users who otherwise wouldn't be able to access the data.
Prometheus and Grafana are enterprise ready in a lot of cases. This stack isn’t just for pre production processes. Many large organizations deploy these technologies at scale in their cloud and have battle-tested it to capture data for clusters ranging from 5-500.
If you want a little enterprise support behind it, a lot of the major cloud providers also have Prometheus and Grafana services (for example Amazon Grafana is one of the available managed, scaled, done-for-you AWS Services).
So, which tool collects what data?
Prometheus is all about collecting observability data. It consumes the raw data from a specific endpoint. In Kubernetes, the `metrics/` endpoint can be exposed and then Prometheus can retrieve metrics from that endpoint.
Grafana is used to view the data in UI-friendly and human-readable dashboards. Although you can definitely view the data in Prometheus and they’re very readable, they’re much easier to interpret and act upon when monitored in dashboards via Grafana.
The code
So far you’ve read a lot from a theoretical perspective of what monitoring does, what observability does, and a few different software tools that you can test to help with these use cases in production. Now, it’s time to dive into the hands-on piece and deploy our own tools and infrastructure into the cloud using AWS services.
Prerequisites
To follow along with the hands-on portion of this blog post, you will need the following:
- An env0 account (you can sign up for a trial here)
- A version control system/source code repo like GitHub to store the code in for deployment purposes (demo repo)
- Access to an AWS account to deploy into the cloud (you can sign up for a free trial/tier here)
- Credentials configured in your env0 environment for AWS
- An Amazon Elastic Kubernetes Service (Amazon EKS) cluster running in the cloud. If you don’t have one, you can learn how to deploy an Amazon EKS cluster with env0 here (LINK the blog post Deploy AWS ECS With CloudFormation and env0 once published)
This section will demonstrate the infrastructure configuration that will be used to deploy Prometheus and Grafana services into an Amazon EKS cluster and start gathering our own observability data.
Because this is an installation of your own tools of Prometheus and Grafana in the cloud, there isn’t specific infrastructure code you need to reference for this. Instead, you’ll use an env0 configuration to deploy into AWS. The env0 configuration contains steps to launch the Amazon EKS cluster in a step-by-step fashion.
The workflow goes as follows:
- Log into and access the Amazon EKS cluster
- Install Helm
- Add the Grafana Helm Chart
- Add the Prometheus Helm Chart
- Create a Service Account in Kubernetes for Grafana
- Install Grafana and Prometheus in the monitoring namespace to start collecting data
Ensure you add the code above to the root directory of the Git repo you’re using to deploy and name it [.code]env0.yaml[.code].
Collecting Telemetry Data by Deploying Prometheus and Grafana with env0
With the [.code]env0.yaml[.code] configuration, you can now prepare env0 to deploy Prometheus and Grafana into Amazon EKS.
First, create a new environment.
For the VCS environment, choose to run Kubernetes so that the system can easily scale.
Select the repo where the [.code]env0.yaml[.code] exists along with the branch. You can leave the Kubernetes folder blank as you’re not deploying a Kubernetes manifest.
Ensure that the [.code]AWS_DEFAULT_REGION[.code] and [.code]CLUSTER_NAME[.code] variables match your EKS cluster.
Once complete, you’ll see in the env0 dashboards that your cloud deployment has started.
You’ll get a prompt to approve the Prometheus and Grafana deployments.
Once complete, you’ll see the AWS resources deployed to the cloud available on your AWS Elastic Kubernetes Service cluster.
Congratulations! You're now gathering data from the services you've deployed to the cloud in AWS!
This is part four of a four-part series. Keep reading to learn more!