Announcing Loom: Open Source Self-serve Kubernetes Cluster Provisioner for AWS

Kubernetes is awesome and in the last twelve months the Kubernetes ecosystem has exploded in popularity. Developers love using and shipping containers while operations love the power and flexibility Kubernetes provides their toolbox to build powerful and resilient service architectures. However, deploying Kubernetes is not awesome if you attempt to run it on Amazon Web Services (“AWS”). Unlike Google Cloud or Microsoft Azure which have out of the box Kubernetes cluster deployment options, Amazon instead offers the Elastic Container Service (“ECS”) which is a proprietary, closed-development container orchestration system that runs on top of Amazon EC2. Because of Amazon’s product decision it is much more complicated to get up and running with Kubernetes on Amazon than the other providers.

As the lonely Ops/Platform Engineer at Datawire, I faced the problem of enabling developers to use Kubernetes. I wanted to make sure developers could help themselves, instead of bugging me for every little issue. I started by trying to get them to use Kops by writing some simple docs on how to set up a cluster, but developers constantly complained about the difficulty of getting a AWS Kubernetes cluster setup. Another problem I had with allowing devs to use kops was that the kops tool does not have a concept of guardrails (e.g., size of nodes and number of nodes to control AWS spend), auditing (what folks are using or doing), or access control (Developer Alice should not be able to mess with staging or prod fabrics).

So I wrote a tool, Loom, that actually wraps kops and gives you the core framework for adding some of these features.

How does Loom work?

Loom is designed to run as a private isolated service within an AWS account. The installation and usage theory is that an ops engineer would start up Loom on an EC2 instance, assign a DNS name for Loom and then tell their developers how to access the API. It’s designed to be that simple. As an example, to get a cluster in Loom the HTTP API call is as basic as the following curl request.

curl -X POST \
     -H "Content-Type: application/json" \
     -d '{"name": "plombardi", "model": "dev-v1"}' \
     localhost:7000/models

If you’re reading that you’re probably wondering how Loom knows what to do when it receives the API request as there is no actual configuration being provided. The magic is handled by the operations engineer who has defined what the model means in the API request. For example, the dev-v1 model looks something like this. Anytime a developer requests a Kubernetes cluster using the dev-v1 model they will get the exact same setup of three t2.small nodes (1 master, 2 workers).

{
  "name": "dev",
  "version": 1,
  "region": "us-east-1",
  "domain": "example.org",
  "masterCount": 1,
  "masterType": "t2.small",
  "sshPublicKey": "ssh-rsa ...",

  "networking": {
    "module": "github.com/datawire/loom//src/terraform/network-v2"
  },

  "nodeGroups": [
    {
      "name": "main",
      "nodeCount": 2,
      "nodeType": "t2.small"
    }
  ]
}

Roadmap

We’re currently using Loom internally and have deployed it with a few other folks. I am open sourcing it now because it’s useful for us, although there’s a lot more work to be done.

Some of the things I hope to add in the near future include auditing, ability to automatically clean up long-running Kubernetes clusters that might be forgotten (useful for automated testing scenarios), provisioning backing resources such as databases on AWS and much much more. (If you have ideas, file an issue!).

Getting started

To get started with Loom today, check out the quick start guide at loom.run. It’s a simple Docker image that will just take a few moments to get setup and using.