R E V D B

Loading

In this post I’d like to talk about testing Terraform. It will be about general principles, methodology, and give practical examples of tools and workflows that we actually use for our database solutions.

I wrote before about great potential of Infrastructure as Code principles for databases. I truly believe the industry will develop in this direction, there is simply no alternative. Of course, assuming predictions of high amount of data and high count of database instances are true.

Now, if we represent infrastructure in code and we know that high quality reliable code needs to be tested, how do we test Terraform?

Role of Terraform in Infrastructure provisioning

Before we delve into testing Terraform weeds, let’s define Terraform role in the provisioning process.

I find it natural to split the provisioning process into three layers. These layers are different in what function they perform, tools they use, and work with different time intervals. I hope it’s not too confusing, let me explain on an example.

Let’s take some service. It can be web application, or bastion host, but as long as we are the database company, let it be MySQL database. What do we need to fully provision it?

1. Capacity where we run MySQL

It used to a a physical server. In the cloud era it is some cloud resource. For a traditional cluster we would need two EC2 instance for a master and replica. The EC2 instances we create for a relatively long time, order of days. We will re-create the instances only if either they die or we change something significant about them like OS version, kernel, or alike. We use Terraform to create the EC2 instances.

2. Provision OS environment

After EC2 instance are up and running we need to configure system level environment on the instances. For example, we need to tune OS itself, install necessary package, install configuration files, start necessary services. This kind of provisioning happens more often. For example, to install MySQL packages, Percona Toolkit, Backup software etc. Or if we need to apply security patches on libraries. Or install new versions of configs or tools that we will use in the next layer of provisioning. We do not want to rotate our MySQL fleet to install a new version of a backup tool, do we? That’s why it is a job for Chef or Puppet.

3. Dynamic provisioning

This is the layer where we configure MySQL replication, this layer will assign roles – whether a server is going to be a master or replica and change the roles when the master fails. We cannot delegate this to Chef, it’s too slow. This layer of provisioning is usually done by an orchestra of different tools and services. For example, Orchestrator, ProxySQL. On top of that we developed RevDB provisioning tool that is responsible for cluster locks, updating service discovery, backups and checksumming reconfiguration. That’s a topic for another post, for now let’s just say it’s a Python software.

Now, the scope of this post is the very first layer of provisioning that is done by Terraform.

Testing Terraform Methodology

There is an excellent talk from Alex Martelli where he explains layers in context of Python. The layered testing principles apply to any language though.

How does it work for Terraform?

A good IDE helps you with most basic language syntax checks, maybe a little bit of linting. This is your first line of defense.

Next layer is linting in general sense. Useful tools would be tflint, terraform validate.

Next layer is unit testing. In theory a unit test works like this. A unit (Python function) has access to some computer resources (CPU, memory, disk), performs some actions (multiply numbers, transforms a string, whatever the Python function does) and then the unit test compares expected and actual result. When the unit test doesn’t have access to a resource (for example, external API, database, DNS etc) then it mocks out the resource. So, it checks that API call was made and was made with correct parameters. If so, the unit test assumes success. See, the unit test delegates testing of the actual API call either to another code or to higher level tests like integration tests.

Terraform could follow this model, but the matter is Terraform almost doesn’t do computation work, it mostly creates resources, modifies them, destroys. So if Terraform followed this model a unit test would consist only of mocks. Practically speaking there would be little help from this king of tests.

So, Terraform tests skip this layer and step into integration tests realm.

The Terraform test creates a real resource, then validates it. That’s an important difference from Python testing.

Terraform Integration Tests

Have you run terraform plan? That’s almost an integration test. We check our real infrastructure, compare it with desirable state and see what Terraform believes needs to be done to converge the desired state with reality.

resource "aws_instance" "db" {
  ami           = "ami-0c43b23f011ba5061"
  instance_type = "t3.nano"
}
terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.aws_ami.ubuntu: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

# aws_instance.db will be created
+ resource "aws_instance" "db" {
+ ami = "ami-0c43b23f011ba5061"
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
+ availability_zone = (known after apply)
+ cpu_core_count = (known after apply)
+ cpu_threads_per_core = (known after apply)
+ get_password_data = false
+ host_id = (known after apply)
+ id = (known after apply)
+ instance_state = (known after apply)
+ instance_type = "t3.nano"
+ ipv6_address_count = (known after apply)
+ ipv6_addresses = (known after apply)
+ key_name = (known after apply)
+ network_interface_id = (known after apply)
+ outpost_arn = (known after apply)
+ password_data = (known after apply)
+ placement_group = (known after apply)
+ primary_network_interface_id = (known after apply)
+ private_dns = (known after apply)
+ private_ip = (known after apply)
+ public_dns = (known after apply)
+ public_ip = (known after apply)
+ security_groups = (known after apply)
+ source_dest_check = true
+ subnet_id = (known after apply)
+ tenancy = (known after apply)
+ volume_tags = (known after apply)
+ vpc_security_group_ids = (known after apply)

+ ebs_block_device {
+ delete_on_termination = (known after apply)
+ device_name = (known after apply)
+ encrypted = (known after apply)
+ iops = (known after apply)
+ kms_key_id = (known after apply)
+ snapshot_id = (known after apply)
+ volume_id = (known after apply)
+ volume_size = (known after apply)
+ volume_type = (known after apply)
}

+ ephemeral_block_device {
+ device_name = (known after apply)
+ no_device = (known after apply)
+ virtual_name = (known after apply)
}

+ metadata_options {
+ http_endpoint = (known after apply)
+ http_put_response_hop_limit = (known after apply)
+ http_tokens = (known after apply)
}

+ network_interface {
+ delete_on_termination = (known after apply)
+ device_index = (known after apply)
+ network_interface_id = (known after apply)
}

+ root_block_device {
+ delete_on_termination = (known after apply)
+ device_name = (known after apply)
+ encrypted = (known after apply)
+ iops = (known after apply)
+ kms_key_id = (known after apply)
+ volume_id = (known after apply)
+ volume_size = (known after apply)
+ volume_type = (known after apply)
}
}

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

If you are an experiences Terraform user you know terraform plan is useful but not enough.

Why? Because even though terraform plan promises to create an instance it doesn’t mean it will. I could have specified wrong AMI, or instance type (tflint catches these), or you could hit AWS limits, or you could ask for grammatically correct but conflicting configuration – a million different reasons are possible.

That’s why we need to actually create the instance and validate it. I am going to show you how to write nice intergration tests for Terraform.

Module repository vs Live repository

I hope you know what Terraform module are. In traditional languages like Python a function is an equivalent of a Terraform module. It’s something that you can reuse.

According to official Terraform terminology no matter whether Terraform code actually creates resources or written to be reused in other Terraform code it’s still caller a module. I think it’s confusing and a bad idea. I saw somewhere (probably coming from Gruntwork) a proposal to call a repository that actually creates resources “live repository” and a repository with a reusable module – “module repository”. This is better terminology and I use it all time and will be using in this post, too.

We will test a module repository. Although it’s possible to test live repositories I will focus on testing the module repositories.

Writing Terraform module

Enough with the overture let’s get to interesting stuff and write a module with a test!

So we write a very simple module that create just one instance. Normally you wouldn’t write so trivial module, but so sake of simplicity and a good illustration let it be as simple.

This is the module structure – variables and the main code.

[foo_module]$ tree
.
├── main.tf
└── variables.tf

0 directories, 2 files

variables.tf

variable "ami" {
  description = "AWS image id"
  default     = "ami-0c43b23f011ba5061"
}

variable "instance_type" {
  description = "EC2 instance type"
  default     = "t3.nano"
}

main.tf

resource "aws_instance" "db" {
  ami           = var.ami
  instance_type = var.instance_type
}

Introducing terraform-ci

terraform-ci is a Python package that you can use together with pytest to write Terraform tests.

It was inspired by Gruntwork’s terratest. I was watching Yevgeniy Brikman’s talk where he explained terratest idea and two thought about two things: 1) terratest is Go while I’m more familiar with Python 2) terratest does nothing that pytest cannot.

It’s a great talk, you should watch it!

So I wrote a couple of helper functions, packaged them in terraform-ci, use them with pytest and never looked back.

Let’s test our module with terraform-ci.

Test live repository

foo_module]$ tree
.
├── main.tf
├── test_live
│   ├── configuration.tfvars
│   ├── main.tf
│   └── providers.tf
└── variables.tf

Since our module is not a live Terraform code, we need to create a live code that would use our module.

The live module should define providers (providers.tf)

provider "aws" {
  region  = "us-east-1"
  version = "~> 2.70"
}

and use our module (main.tf).

module "db" {
  source = "../"
}

Now let’s create the test code.

Be sure to install terraform-ci.

$ pip install terraform-ci
$ tree
.
├── main.tf
├── test_live
│   ├── configuration.tfvars
│   ├── main.tf
│   └── providers.tf
├── tests
│   ├── __init__.py
│   └── test_db.py
└── variables.tf

And here’s the test code.

from terraform_ci import terraform_apply


def test_db():
    with terraform_apply("test_live"):
        pass
What’s going on here?
The test does nothing but creates resources defined in the module test_live. When the code exits the with block pytest destroys the resources.
This way we test only that terraform apply runs successfully – nothing else, no validation logic.
So, let’s run it?
 
$ pytest tests/test_db.py
================================= test session starts ================
platform darwin -- Python 3.7.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/aleks/tmp/foo_module
plugins: timeout-1.4.1, rerunfailures-8.0
collected 1 item

tests/test_db.py .                                               [100%]

============================ 1 passed in 80.60s (0:01:20) =============

Test validation

What about validation you might ask a fair question.

And here is a beauty of the suggested approach. You have full flexibility of what and how you want to validate the test run.

 

Validation with boto3

For example, you can use boto3 to query AWS and verify that the instance was created with correct AMI. I’m making this up, but you get the idea.

import boto3
from terraform_ci import terraform_apply


def test_db():
    with terraform_apply("test_live", destroy_after=False):
        client = boto3.client("ec2")
        response = client.describe_instances(
            Filters=[{"Name": "instance-state-name", "Values": ["running"]}],
        )
        assert (
            response["Reservations"][0]["Instances"][0]["ImageId"]
            == "ami-0c43b23f011ba5061"
        )

Validation with Terraform outputs

It is also possible to verify Terraform output variables.

outputs.tf

output "associate_public_ip_address" {
  value = aws_instance.db.associate_public_ip_address
}

Similar in test_live/outputs.tf

output "associate_public_ip_address" {
  value = module.db.associate_public_ip_address
}

Then the test will looks like

from terraform_ci import terraform_apply


def test_db():
    with terraform_apply("test_live", json_output=True) as tf_out:
        assert tf_out["associate_public_ip_address"]["value"] is True

TL;DR

  • To test Terraform code you need to run intergration test
  • terraform-ci is a tool to test Terraform with pytest

There are many more features and use cases for terraform-ci and testing Terraform in general, but this post is getting too big and I’m getting too tired. So, next time 🙂

Leave a Comment

revDB_Light

© 2020 Revenants CIE LLC.

US toll-free: +1-877-REVDB4U

International: +1-669-777-6044

Redwood City, CA 94061
PO Box 610126