Embracing Infrastructure as Code is a key step in your journey to cloud native operations. It also opens the door to approach other aspects of your operations as code, and a great example of this is using code to define and evaluate policy.
This post will introduce you to the concepts behind policy as code, and how to use Open Policy Agent (OPA) to implement policy as code with your existing Terraform configurations.
What is policy as code and why is it important?
Many organizations are struggling to address compliance regulations, governance policies, and security requirements in a consistent way across all of their application and infrastructure deployments. Ideally they would like to perform policy checks before their applications or infrastructure are deployed and automatically enforce policies based on the results of the tests.
We should probably start at the most fundamental aspect of policy as code, that being the policy itself. What do we mean when referring to a policy? The easiest definition I've seen for a policy is a set of rules that govern how a system or organization should behave.
Now, that's a pretty broad definition, but I think the key here is that a policy contains rules and those rules express how we expect a system to be configured or behave. Whether we apply that policy to our systems through code or manually, the end result is an evaluation that tells us whether or not our system is compliant with the policy.
So, what is policy as code then?
It's simply the expression of those rules through code along with software that can parse the rules and make a determination of compliance. The code can be in a general purpose programming language, like Python, or expressed through a domain-specific language, as is the case with Open Policy Agent (OPA) through Rego.
If you're looking to implement policy as code, you'll need to do a few things:
- Discover what your policies should enforce
- Meet with organization stakeholders to establish standards
- Include Security, Compliance, Operations, and Development
- Document policies and receive stakeholder buy-in
- Express policies as code and apply to infrastructure
For those last two tasks, you can apply Open Policy Agent (OPA) to your infrastructure as code. OPA was purpose built to evaluate policies based on input data. That input data can come from the infrastructure as code you've used to deploy resources and applications to your environment.
What is Open Policy Agent?
Open Policy Agent (OPA) is a general purpose policy engine that can be used to evaluate policies expressed in Rego, using data gathered in JSON format from multiple sources. The results of an evaluation can be used in policy enforcement.
OPA is a CNCF project that was originally developed at Styra. The initial focus was on Kubernetes manifests and workloads to enforce security and address admission controls, but its general nature allows it to be applied to any domain, such as cloud infrastructure.
OPA is packaged as a single binary and it is capable of running in many modes. You can launch an interactive environment running OPA to test policies, you can run it in server mode and make requests via the API, or you can run it as a command-line tool feeding in the rules and data as parameters. Open Policy Agent can also be embedded in your own applications, allowing you to evaluate OPA policies as part of your application logic.
How Open Policy Agent (OPA) Works
The architecture of OPA is relatively simple. As I already mentioned, the core OPA binary is at the center of it all. Whether you choose to run it as a service or simply utilize it as a CLI tool is up to you.
The power of OPA is in its policy language and the virtual document it builds to evaluate rules expressed in Rego. When OPA runs, it utilizes Rego policies combined with JSON data to evaluate rules and determine the response value for each rule based on a query.
When you want to check on the results of a policy, you create a query that will return a decision based on the user input you've provided. The decision can be a simple boolean yes/no, or it can be a more complex structure that includes additional information about the decision.
Behind the scenes, OPA is generating and maintaining a virtual document that has your user input, data, and rules all combined. This document is what is used to evaluate the rules and then generate the decisions.
The virtual document is available for you to query, allowing you to see the results of the evaluation. This is a great way to debug your policies and ensure they are working as expected.
Benefits of Open Policy Agent (OPA)
Using Open Policy Agent enables you to store your policies as code and apply them programmatically to your planned or deployed infrastructure. Because it is a general purpose tool and not beholden to a specific technology, you can use OPA to write and test policies against infrastructure changes from Terraform, Kubernetes, Docker, and more.
Using a cloud provider and vendor agnostic tool provides a significant amount of flexibility and allows you to integrate OPA across multiple platforms, leveraging a common foundation of knowledge and patterns. Kind of like using Terraform to manage your cloud-native infrastructure, you can use OPA across one or many cloud providers.
OPA can be easily integrated into your CI/CD pipeline and the results generated by OPA can be used to enforce policies and determine whether or not to proceed with a deployment. Since the decision can hold additional context, you can create more nuanced actions based on the results.
As an example policy, you could allow a deployment to proceed if the only issue is a missing tag, but block the deployment if there are any security issues. OPA can also be used to inspect existing infrastructure deployments on a cloud provider and report on compliance. Remember, Terraform state is JSON too! This is a great way to get a handle on your infrastructure deployments and identify areas that need to be addressed.
Writing an OPA Policy
That's enough background, let's see how you can get started with OPA and Terraform. The first thing you'll need to do is install OPA. You can find the installation instructions for your operating system on the OPA website.
Next, you're going to need some input to evaluate and rules to apply to the input. For this example, we're going to use a simple Terraform code configuration that creates an Azure Virtual Network with two subnets. The resources in our terraform file are shown below.
The full version of this example along with the policies we will be using are available on GitHub. Also included, is an advanced version of the policies broken up into distinct packages and files. The advanced version is not covered in this post, but it is available for you to explore.
Evaluating Terraform Plans
From this simple configuration, we need to get something in JSON data to send to OPA for evaluation. We can ran terraform plan to generate proposed resource changes and convert it to JSON using the following commands:
The resulting JSON file is just over 400 lines long. That's a lot of information! Included in the execution plan are the following sections:
- [.code]configuration[.code] - The rendered Terraform code configuration used to generate the plan.
- [.code]variables[.code] - The values of any variables used in the configuration.
- [.code]planned_values[.code] - The values of the resource types that will be created or modified.
- [.code]resource_changes[.code] - All proposed changes that will be applied to the target environment.
You can inspect any of those keys and attributes using OPA, but the most relevant sections for most folks will be [.code]planned_values[.code] or [.code]resource_changes[.code]. Let's use the contents of the [.code]resource_changes[.code] to inspect what changes are being made to the environment.
Now that we have our data, we can start to write policies in Rego.
Writing Policies in Rego
Rego is a high level declarative language used to write tests and express policies. It is built around the concept of rules that are evaluated against data. Each set of rules will be part of a package.
The rules themselves will either evaluate to a value or if there is no value that meets the criteria, the rule will resolve to [.code]false[.code], [.code]undefined[.code],or an empty result based on a query. The following is an simple example policy that will evaluate to true if the value of taco is equal to nacho.
Since that's obviously not true, the rule will evaluate as false. Rego can work with more complex data types, like arrays, objects, and sets.
The beginning of a rego file will start with the package name followed by any import statements. For instance, let's create a package called terraform.functions and import the content of the resource_changes key from the JSON file we generated earlier.
We will pass the JSON file created from our terraform plan as input when we run OPA from the command line, so we can use the input keyword to reference the data.
Using the [.code]import[.code] statement allows us to reference [.code]input.resource_changes[.code] simply as [.code]resource_changes[.code] within the rest of the document. Import can also be used to import other packages.
Let's start with a simple rule that will evaluate to true if any resources are being created.
The first line of the rule creates a new array (called a comprehension in Rego) named [.code]creates[.code] that contains all the infrastructure resources that are being created. You could read it like, "give me all the resources in resource_changes where the change action is set to create".
The second line of the rule checks to see if the count of the [.code]creates[.code] array is greater than zero. If the array has elements, it will resolve to true. If the array is empty, it will resolve to [.code]undefined[.code].
The syntax of Rego can be a little confusing at first, so I highly recommend checking out the Policy Language page in the OPA documentation.
We can test this rule by using the [.code]opa exec[.code] command, which will return the value of a particular rule in the policy based on policy files and input supplied.
The above command will run opa and evaluate the rules of any [.code].rego[.code] files found in the [.code]policy/[.code] directory against the JSON file we generated earlier and return the value for the [.code]resources_created[.code] rule.
The resulting output is in JSON format and will look something like this.
That's a simple example of a policy in Rego that evaluates the actions proposed by our terraform plan output. We can expand the capabilities of our opa policy by introducing helper rules that you can think of as functions. For instance, you might want to retrieve all Virtual Networks being created and make sure they have the [.code]environment[.code] tag set.
First we can create a rule that retrieves all the resources of a specific resource type and action.
In the function, we are defining [.code]filtered_resources[.code] as any resource that matches the type and action passed to the function. Multiple statements in a rule are joined with an implicit AND operator, which means OPA checks and returns only values that match all of the statements inside the Rego policy.
You can read the function as something like, "Give me a map of resources based on the resource object [.code]resources[.code] where the resource [.code]type[.code] is equal to type and the change is equal to [.code]action[.code]."
We can then use this function in another rule to retrieve all the Virtual Networks being created and store the value in the vnets_created variable.
With our Virtual Networks retrieved, we can create another set of functions. One to check if any tags on a resource are missing based on a list of required tags and another that will check a set of resources and only return those missing the required tags.
And now finally, for our example we can define our list of required tags and a rule to check against our the tags attributes in our list of Virtual Networks.
Remember that Rego will only return a result where all the statements in the rule are true. In the deny rule, we first retrieve any Virtual Networks that are missing the required tags and store the result in the variable resources.
Next, we check to see if the resources variable is empty. If it isn't, then we have some Virtual Networks missing tags and so we set msg to have text stating which Virtual Networks are missing tags. If resources is empty, then msg will be set to undefined.
We can repeat the deny rule multiple times in our policy to add more messages to the msg variable. Multiple rules with the same name will be combined into a single result using an implicit OR operator. This allows you to build out a policy that will return multiple messages based on the actions being taken by Terraform.
To evaluate the policy, we can use the opa exec again.
And since our Virtual Network does not have the owner tag set, this time we'll get back the following output:
What you choose to do with the output of an OPA evaluation is up to you. The presence of a deny result might block the application of the plan. Alternatively, you could create warn results that require approval from a senior member of the team, or info results that allow the apply to continue without any intervention.
Limitations and Challenges of Open Policy Agent (OPA)
Open Policy Agent is a fantastic tool for implementing Policy as Code, but it does have some challenges to be aware of:
Challenge Solution
Rego may not be intuitive for some people (myself included!) - Refer to the documentation and use the OPA playground to learn more
You have to develop the logic to parse OPA results and decisions - CI/CD and automation tooling, like env0, can provide a workflow to parse a terraform plan against OPA policies and enforce the results
OPA policies need to be adjusted for each new platform and cloud provider - Leverage the OPA community and example policies for common providers and cloud solutions
Conclusion
I hope this introduction and tutorial on using OPA has been helpful. Stay tuned for future articles where we'll step through integrating OPA with env0 and Terraform to make sure your infrastructure deployments are secure and compliant.
In the meantime, if you have any questions or comments, please feel free to reach out to me on Twitter.
Embracing Infrastructure as Code is a key step in your journey to cloud native operations. It also opens the door to approach other aspects of your operations as code, and a great example of this is using code to define and evaluate policy.
This post will introduce you to the concepts behind policy as code, and how to use Open Policy Agent (OPA) to implement policy as code with your existing Terraform configurations.
What is policy as code and why is it important?
Many organizations are struggling to address compliance regulations, governance policies, and security requirements in a consistent way across all of their application and infrastructure deployments. Ideally they would like to perform policy checks before their applications or infrastructure are deployed and automatically enforce policies based on the results of the tests.
We should probably start at the most fundamental aspect of policy as code, that being the policy itself. What do we mean when referring to a policy? The easiest definition I've seen for a policy is a set of rules that govern how a system or organization should behave.
Now, that's a pretty broad definition, but I think the key here is that a policy contains rules and those rules express how we expect a system to be configured or behave. Whether we apply that policy to our systems through code or manually, the end result is an evaluation that tells us whether or not our system is compliant with the policy.
So, what is policy as code then?
It's simply the expression of those rules through code along with software that can parse the rules and make a determination of compliance. The code can be in a general purpose programming language, like Python, or expressed through a domain-specific language, as is the case with Open Policy Agent (OPA) through Rego.
If you're looking to implement policy as code, you'll need to do a few things:
- Discover what your policies should enforce
- Meet with organization stakeholders to establish standards
- Include Security, Compliance, Operations, and Development
- Document policies and receive stakeholder buy-in
- Express policies as code and apply to infrastructure
For those last two tasks, you can apply Open Policy Agent (OPA) to your infrastructure as code. OPA was purpose built to evaluate policies based on input data. That input data can come from the infrastructure as code you've used to deploy resources and applications to your environment.
What is Open Policy Agent?
Open Policy Agent (OPA) is a general purpose policy engine that can be used to evaluate policies expressed in Rego, using data gathered in JSON format from multiple sources. The results of an evaluation can be used in policy enforcement.
OPA is a CNCF project that was originally developed at Styra. The initial focus was on Kubernetes manifests and workloads to enforce security and address admission controls, but its general nature allows it to be applied to any domain, such as cloud infrastructure.
OPA is packaged as a single binary and it is capable of running in many modes. You can launch an interactive environment running OPA to test policies, you can run it in server mode and make requests via the API, or you can run it as a command-line tool feeding in the rules and data as parameters. Open Policy Agent can also be embedded in your own applications, allowing you to evaluate OPA policies as part of your application logic.
How Open Policy Agent (OPA) Works
The architecture of OPA is relatively simple. As I already mentioned, the core OPA binary is at the center of it all. Whether you choose to run it as a service or simply utilize it as a CLI tool is up to you.
The power of OPA is in its policy language and the virtual document it builds to evaluate rules expressed in Rego. When OPA runs, it utilizes Rego policies combined with JSON data to evaluate rules and determine the response value for each rule based on a query.
When you want to check on the results of a policy, you create a query that will return a decision based on the user input you've provided. The decision can be a simple boolean yes/no, or it can be a more complex structure that includes additional information about the decision.
Behind the scenes, OPA is generating and maintaining a virtual document that has your user input, data, and rules all combined. This document is what is used to evaluate the rules and then generate the decisions.
The virtual document is available for you to query, allowing you to see the results of the evaluation. This is a great way to debug your policies and ensure they are working as expected.
Benefits of Open Policy Agent (OPA)
Using Open Policy Agent enables you to store your policies as code and apply them programmatically to your planned or deployed infrastructure. Because it is a general purpose tool and not beholden to a specific technology, you can use OPA to write and test policies against infrastructure changes from Terraform, Kubernetes, Docker, and more.
Using a cloud provider and vendor agnostic tool provides a significant amount of flexibility and allows you to integrate OPA across multiple platforms, leveraging a common foundation of knowledge and patterns. Kind of like using Terraform to manage your cloud-native infrastructure, you can use OPA across one or many cloud providers.
OPA can be easily integrated into your CI/CD pipeline and the results generated by OPA can be used to enforce policies and determine whether or not to proceed with a deployment. Since the decision can hold additional context, you can create more nuanced actions based on the results.
As an example policy, you could allow a deployment to proceed if the only issue is a missing tag, but block the deployment if there are any security issues. OPA can also be used to inspect existing infrastructure deployments on a cloud provider and report on compliance. Remember, Terraform state is JSON too! This is a great way to get a handle on your infrastructure deployments and identify areas that need to be addressed.
Writing an OPA Policy
That's enough background, let's see how you can get started with OPA and Terraform. The first thing you'll need to do is install OPA. You can find the installation instructions for your operating system on the OPA website.
Next, you're going to need some input to evaluate and rules to apply to the input. For this example, we're going to use a simple Terraform code configuration that creates an Azure Virtual Network with two subnets. The resources in our terraform file are shown below.
The full version of this example along with the policies we will be using are available on GitHub. Also included, is an advanced version of the policies broken up into distinct packages and files. The advanced version is not covered in this post, but it is available for you to explore.
Evaluating Terraform Plans
From this simple configuration, we need to get something in JSON data to send to OPA for evaluation. We can ran terraform plan to generate proposed resource changes and convert it to JSON using the following commands:
The resulting JSON file is just over 400 lines long. That's a lot of information! Included in the execution plan are the following sections:
- [.code]configuration[.code] - The rendered Terraform code configuration used to generate the plan.
- [.code]variables[.code] - The values of any variables used in the configuration.
- [.code]planned_values[.code] - The values of the resource types that will be created or modified.
- [.code]resource_changes[.code] - All proposed changes that will be applied to the target environment.
You can inspect any of those keys and attributes using OPA, but the most relevant sections for most folks will be [.code]planned_values[.code] or [.code]resource_changes[.code]. Let's use the contents of the [.code]resource_changes[.code] to inspect what changes are being made to the environment.
Now that we have our data, we can start to write policies in Rego.
Writing Policies in Rego
Rego is a high level declarative language used to write tests and express policies. It is built around the concept of rules that are evaluated against data. Each set of rules will be part of a package.
The rules themselves will either evaluate to a value or if there is no value that meets the criteria, the rule will resolve to [.code]false[.code], [.code]undefined[.code],or an empty result based on a query. The following is an simple example policy that will evaluate to true if the value of taco is equal to nacho.
Since that's obviously not true, the rule will evaluate as false. Rego can work with more complex data types, like arrays, objects, and sets.
The beginning of a rego file will start with the package name followed by any import statements. For instance, let's create a package called terraform.functions and import the content of the resource_changes key from the JSON file we generated earlier.
We will pass the JSON file created from our terraform plan as input when we run OPA from the command line, so we can use the input keyword to reference the data.
Using the [.code]import[.code] statement allows us to reference [.code]input.resource_changes[.code] simply as [.code]resource_changes[.code] within the rest of the document. Import can also be used to import other packages.
Let's start with a simple rule that will evaluate to true if any resources are being created.
The first line of the rule creates a new array (called a comprehension in Rego) named [.code]creates[.code] that contains all the infrastructure resources that are being created. You could read it like, "give me all the resources in resource_changes where the change action is set to create".
The second line of the rule checks to see if the count of the [.code]creates[.code] array is greater than zero. If the array has elements, it will resolve to true. If the array is empty, it will resolve to [.code]undefined[.code].
The syntax of Rego can be a little confusing at first, so I highly recommend checking out the Policy Language page in the OPA documentation.
We can test this rule by using the [.code]opa exec[.code] command, which will return the value of a particular rule in the policy based on policy files and input supplied.
The above command will run opa and evaluate the rules of any [.code].rego[.code] files found in the [.code]policy/[.code] directory against the JSON file we generated earlier and return the value for the [.code]resources_created[.code] rule.
The resulting output is in JSON format and will look something like this.
That's a simple example of a policy in Rego that evaluates the actions proposed by our terraform plan output. We can expand the capabilities of our opa policy by introducing helper rules that you can think of as functions. For instance, you might want to retrieve all Virtual Networks being created and make sure they have the [.code]environment[.code] tag set.
First we can create a rule that retrieves all the resources of a specific resource type and action.
In the function, we are defining [.code]filtered_resources[.code] as any resource that matches the type and action passed to the function. Multiple statements in a rule are joined with an implicit AND operator, which means OPA checks and returns only values that match all of the statements inside the Rego policy.
You can read the function as something like, "Give me a map of resources based on the resource object [.code]resources[.code] where the resource [.code]type[.code] is equal to type and the change is equal to [.code]action[.code]."
We can then use this function in another rule to retrieve all the Virtual Networks being created and store the value in the vnets_created variable.
With our Virtual Networks retrieved, we can create another set of functions. One to check if any tags on a resource are missing based on a list of required tags and another that will check a set of resources and only return those missing the required tags.
And now finally, for our example we can define our list of required tags and a rule to check against our the tags attributes in our list of Virtual Networks.
Remember that Rego will only return a result where all the statements in the rule are true. In the deny rule, we first retrieve any Virtual Networks that are missing the required tags and store the result in the variable resources.
Next, we check to see if the resources variable is empty. If it isn't, then we have some Virtual Networks missing tags and so we set msg to have text stating which Virtual Networks are missing tags. If resources is empty, then msg will be set to undefined.
We can repeat the deny rule multiple times in our policy to add more messages to the msg variable. Multiple rules with the same name will be combined into a single result using an implicit OR operator. This allows you to build out a policy that will return multiple messages based on the actions being taken by Terraform.
To evaluate the policy, we can use the opa exec again.
And since our Virtual Network does not have the owner tag set, this time we'll get back the following output:
What you choose to do with the output of an OPA evaluation is up to you. The presence of a deny result might block the application of the plan. Alternatively, you could create warn results that require approval from a senior member of the team, or info results that allow the apply to continue without any intervention.
Limitations and Challenges of Open Policy Agent (OPA)
Open Policy Agent is a fantastic tool for implementing Policy as Code, but it does have some challenges to be aware of:
Challenge Solution
Rego may not be intuitive for some people (myself included!) - Refer to the documentation and use the OPA playground to learn more
You have to develop the logic to parse OPA results and decisions - CI/CD and automation tooling, like env0, can provide a workflow to parse a terraform plan against OPA policies and enforce the results
OPA policies need to be adjusted for each new platform and cloud provider - Leverage the OPA community and example policies for common providers and cloud solutions
Conclusion
I hope this introduction and tutorial on using OPA has been helpful. Stay tuned for future articles where we'll step through integrating OPA with env0 and Terraform to make sure your infrastructure deployments are secure and compliant.
In the meantime, if you have any questions or comments, please feel free to reach out to me on Twitter.