R E V D B

Loading

We established already that Infrastructure as Code is the right way to manage the databases at scale. That means, you have to deal with a lot of code. There will be different kind of software: Terraform modules, and live repositories, Chef cookbooks, Python provisioning tools and many more. The code needs to be hosted somewhere and GitHub is one of common choices (GitLab or BitBucket are possible alternatives). But why do you need Terraform to manage GitHub? As with any web UI service there is a temptation to quickly setup everything from a browser, however I will show that there is a better way.

TL;DR

  • to avoid pitfalls of monorepos
  • to enforce PCI DSS, SOX compliance
  • to use cookie cutters for uniform and well-organized repositories configuration
  • to enable CI/CD, documentation, packaging, and security out of box

Why not Monorepos?

I will assume in the article that there are many repositories in your organization. There are reasons for that.

A while ago I wrote about monorepos and why to avoid them. In short, I’ve never seen them working well. I have a feeling that people are trying to copy Google but due to lack of tooling and resources the monorepos become unmanageable monsters.

For starters, a monorepo is difficult to clone. If a company is more or less big with more or less many different components the repository inflates to gigantic sizes. The history or changes is so big that a simple task of cloning the repository takes enormous time and often fails.

The monorepo is hard to work with. It is a separate universe with its rules, traditions, workarounds. It is unknown beast for new employees who need to read tons of documentation to understand the structure of the repository, how to work with it. It’s also company specific so new people are effectively rookies no matter how experienced they are. That all increases onboarding time and complexity.

Over time monorepos get their dark corners nobody know why they exist and how to deal with them.

The diverse nature of the hosted software makes it hard to be efficient in the monorepo. CI/CD rules are different for Terraform modules and Python libraries. Workflows are different, security requirements and configuration are different.

The monorepos are rigid when it comes to dependencies. Conflict of versions are common. It is hard to migrate between dependencies versions, between Python version, between OS versions.

Consequentially, if the monorepos are bad then you’ll have to be able to work efficiently with many small repositories. It is easy to configure linters, unit-, integration tests, security checks, packaging, documentation for one repository. But if there are many of them – you need a tool to do that in a uniform manner. Terraform is such tool.

My point here that softwares are different. They have different properties and should live in their own repositories. And instead of pushing a square peg in a round hole we need to learn how to manage many small repositories. Nobody says, let’s dump all data into a single database instance because then user management, schema changes, failovers will be easy, right? We have many small replica sets despite management overhead. Why should it be different with software repositories?

PCI DSS and SOX Compliance

In a previous post I promised that Infrastructure as Code helps to solve compliance problems. Now it’s time to keep the promise and show how that works.

User Management

Adding or removing user who can change infrastructure is covered by many certifications. If a user exists in code that means there is a corresponding commit and pull request with an author who made the change, with an reviewer who approved the change. If that is done with Terraform and if Continuous Deployment is done without human intervention that makes the process automatically compliant with PCI DSS. It makes it clear and easy and removes possibility of situations when a JIRA ticket was created post factum.

This is how it looks for us.

# configuration.tfvars
org_admins = [
  "terraformrcie",
]

org_members = [
  "tamaskozak",
  "ipodorrcie",
  "akuzminsky",
  "ptrfarkas",
]
# members.tf
resource "github_membership" "member" {
  for_each = toset(var.org_members)
  username = each.value
}

resource "github_membership" "admin" {
  for_each = toset(var.org_admins)
  username = each.value
  role     = "admin"
}

Despite each of us are not GitHub organization admin we can add a new user. I can create a pull request, Istvan will approve it and Travis-CI will deploy it.

Repository Management

Same way it works for GitHub repositories.

# repos.tf
module "terraform-aws-orcherstrator" {
  source        = "./modules/github-repo/"
  name          = "terraform-aws-orcherstrator"
  description   = "Terraform module that creates Orchestrator."
  private       = true
  organization  = var.github_organization
  owner_team_id = github_team.committers.id
  ssh_key_path  = abspath(".env/id_rsa")
  admin_team_id = github_team.admins.id
  repo_kind     = "terraform"
}

Anyone in our organization can create a repository. The change will be reviewed and approved and the repository will be configured automatically in a uniform compliant way.

Branch Protection

When a change is suggested you and PCI DSS wants it to be reviewed and approved. To technically enforce that you need to configure branch protection. Then nobody can overcome the rules.

resource "github_branch_protection" "default_branch" {
  repository     = github_repository.repo.name
  branch         = var.default_branch
  enforce_admins = true

  required_status_checks {
    strict = true
    contexts = [
      "Travis CI - Branch",
      "Travis CI - Pull Request"
    ]
  }
  required_pull_request_reviews {
    dismiss_stale_reviews = true
  }
}

Here we require:

  • the pull request is reviewed and approved.
  • the pull request passes all required tests and checks (lint, dependencies vulnerabilities, unit tests – everything that is applicable for a given repository type).
  • The status checks are based on the latest revision of the master branch.

The last one is especially important. I think many of you saw a situation when pull request’s unit tests pass but fail as soon as the pull request is merged. The recent changes to the master break the tests but you cannot see that unless the pull request is rebased.

When it comes to Terraform (if the pull request is a Terraform change) then it becomes critically important. It is absolutely necessary that Terraform plan is based on the very latest commit from the master. Otherwise the plan will show inaccurate data – for example, it may suggest that the change will destroy resources created after the pull request branch was forked.

These little details cannot be forgotten and should be configured for every repository. How to make sure it happens? With Terraform.

Using Cookie Cutters

The repositories can host different kinds of code: Terraform modules, Chef recipes, Python libraries. For each kind there are different configuration rules. For example, for a Python library you need to configure publishing a release to PyPi, for Python applications you need to build an RPM package. For a Terraform live repository you need to publish the plan, while for a Terraform module you need to run unit tests.

Cookiecutter is a powerful tool to generate a repository from a template.

You may have noticed that I specified a repo_kind argument in the example before.

# repos.tf
module "terraform-aws-orcherstrator" {
...
  repo_kind     = "terraform"
}

The github-repo module uses this argument to initialize the repository.

It runs a local provisioner when it creates the repository.

# modules/github-repo/main.tf
resource "github_repository" "repo" {
  name               = var.name
...
  provisioner "local-exec" {
    command = data.template_file.init_repo.rendered
  }
}

data "template_file" "init_repo" {
  template = file("${path.module}/init_repo.sh")
  vars = {
...
    repo_kind = var.repo_kind
  }
}

And the provisioner script uses repo_kind to use correct cookie cutter.

# modules/github-repo/init_repo.sh
echo "Generating ${repo_kind} repo"

case "${repo_kind}" in
   "terraform")
      cookiecutter_url="https://github.com/revenants-cie/cookiecutter-terraform.git"
      ;;
   "python")
      cookiecutter_url="https://github.com/audreyr/cookiecutter-pypackage.git"
      ;;
   "empty")
      cookiecutter_url="https://github.com/revenants-cie/cookiecutter-empty.git"
      ;;
   *)
     echo "Unsupported repo kind ${repo_kind}"
     exit 1
     ;;
esac

The cookie cutter template can be either one of publicly available (like for a Python package in the example above) or your custom baked for your organization needs.

CI/CD Configuration

We use Travis-CI for our CI/CD. It performs two main functions. Test a change and deploy it when the change is merged into the master.

The Travis-CI configuration is pretty straightforward and done in a .travis.yml file. We keep it simple, universal for a kind of repository. For example, for our Python application it looks like this.

# .travis.yml
---
dist: bionic
language: python
python: '3.7'
install:
    - make bootstrap
script:
    - make lint test
deploy:
    - provider: script
      skip_cleanup: true
      script: make package
      on:
          branch: master
    - provider: script
      skip_cleanup: true
      script: make upload
      on:
          branch: master
...

Documentation as Code

Documentation is also configured for the repository. All a developer needs to do is to write it. All overhead with setting it up is taken care by Terraform. If it’s easy to do the developer will likely to do it. That’s why Terraform.

---
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Optionally set the version of Python
# and requirements required to build your docs
python:
    install:
        - requirements: "requirements_dev.txt"
    version: 3.7

# Build documentation in the docs/ directory with Sphinx
sphinx:
    configuration: "docs/conf.py"

# Required
version: 2
...

Packaging

For different kind of software packaging is done differently. Let’s see examples.

For a public Python library Travis-CI publishes a Python package on PyPi.

# https://github.com/twindb/terraform-ci/blob/master/.travis.yml

deploy:
  provider: pypi
  user: twindb
  distributions: sdist bdist_wheel
  on:
    branch: master
    python: '3.7'

For a Terraform module we need to upload it to S3.

# .travis.yml
deploy:
  skip_cleanup: true
  provider: script
  script: terraform-cd --module-version $TRAVIS_TAG revdb-terraform-modules
  on:
    branch: master
    tags: true

For Chef cookbooks we upload them to the Chef Server. For Terraform live repos we execute terraform apply and so on.

The configuration depends on the repository kinf and its name. All this is easily can be added to a cookie cutter template and you can be sure that same rules, same checks are used across the organization.

Can you imagine now how powerful this mechanism is? A user requested a repository of type “foo” and Terraform creates a secure, compliant, fully configured repository with CI/CD, packaging, documentation, vulnerabilities, code style checks. All this is possible thanks to Infrastructure as Code and managing GitHub with Terraform is the right way to go.

Leave a Comment

revDB_Light

© 2020 Revenants CIE LLC.

US toll-free: +1-877-REVDB4U

International: +1-669-777-6044

Redwood City, CA 94061
PO Box 610126