Announcing Kubernaut: Instantaneous ephemeral Kubernetes clusters in the cloud

Here at Datawire, we’ve been hacking on some cloud applications and tools on Kubernetes. As our development team added services we found there was often a need for instant on-demand and ephemeral Kubernetes clusters that run in the cloud. We need these in a variety of situations:

  • As part of our CI workflow where we want to do a clean install and test of our services. The popular Minikube fails us here because you cannot run a VM inside of another container or VM.
  • Quickly standup a specific version of our app to reproduce a problem but do it in the cloud rather than locally. Often so we can share it easily with other folks on the team.
  • Hacking around for fun and profit!

We tried using Google Container Engine. Our developers use Google as their primary development environment, so we figured it would be a natural fit. However, we found that automatically destroying clusters when you had completed a job was not what GKE was designed to do and getting engineers to remember to do it was often a challenge not to mention sometimes automated test code fails and the cleanup does not happen.

To solve this problem we wrote a web service and accompanying command-line tool that hands out short-lived Kubernetes clusters. While handing out Kubernetes clusters is easy it is quite difficult to do it instantly, so we spent a bunch of time optimizing this flow. Behind the scenes we manage a pool of Kubernetes clusters and when you do a kubernaut claim you are actually given the keys to a pre-allocated cluster. Every cluster is fresh and when you are done with the cluster we destroy the underlying machines so nothing is preserved between claims or leaked to other users.

We wrote Kubernaut for internal usage, but thought it might be interesting to other folks so I wanted to share it to see if anyone else might find it useful. I’m keeping the backing pool for Kubernaut small for now since this is an experimental project so if you receive a Kubernaut is at capacity type error then this is the reason.

If you’re interested in trying it out, go to the Kubernaut GitHub repository or join our Gitter chat if you have any questions.

  • David Ndungu

    I am faced with a similar challenge and wondering why you did not choose to use one GKE cluster and have instances be new namespaces? Each namespace would then be created, tests ran, then the namespace would be deleted.

    Thanks.
    David

    • Richard Li

      Great question. Many organizations use namespaces already for other scenarios. For example, Ancestry (https://www.microservices.com/talks/ancestrys-journey-towards-microservices-containerization-kubernetes-paul-mackay/) uses namespaces to create isolation between services. In that scenario, you can’t use namespaces for something else.

      More generally, because Kubernaut gives you literally a unique Kubernetes instance, it is as real as you can get for a test scenario.

      • David Ndungu

        Thanks for your response.

        I think in my scenario I will create a dedicated cluster for testing and create namespaces for each new test environment. It will be much quicker to spin up and tear down such environments.

        I was trying to figure out a way to route HTTPS traffic to specific test environments and could not find a tool that could do it out of the box. I started working on a POC https://github.com/dndungu/facade It is essentially a reverse proxy that auto provisions wildcard subdomain SSL certificates (DONE) and then looks up a GRPC (TODO) back-end to find the right service ClusterIP to use as an upstream.

        David

    • Philip Lombardi

      David I am the Kubernaut lead developer. It is a good question. There are a range of implementation strategies for doing this an they all have different pros and cons. The two big reasons I implemented it the current way:

      1. I want users to be full cluster admins and not bound to a single namespace.
      2. I know the AWS API’s better and the backend implementation was a breeze because of that.

      I am not quite sure how the pooling mechanism would have worked with GKE or if I could have gotten the desired level of isolation with the GKE approach you mentioned.

      Long term I would like to make Kubernaut totally independent of the compute cloud it is running on. I am doing some research now on Kubernetes in Kubernetes (the great Kubeception!). This approach would allow us to eliminate the long-running instance pool on our side while simultaneously enabling us to run Kubernaut on any cloud. We could then easily offer Google, AWS, Azure or hosted Kubernaut environments that allow users to do things like provision ephemeral RDS databases with their cluster if using the AWS Kubernaut or access ephemeral BigTable on Google Cloud.