co-authored by Chris Ireland and Keith McClellan
In this blog we explore one option for the implementation of a multi-cloud deployment of CockroachDB. CockroachDB is a great fit for Kubernetes and we show how you can use Skupper to rapidly deliver a single CockroachDB cluster that is agnostic to the three most popular clouds. We start with a simple two-cloud example in order to demonstrate the basics before moving on to a true multi-cloud cluster. In a matter of 30 minutes or less you can have a CockroachDB cluster running across Azure, AWS and GCP.
CockroachDB is the solution for resilient data storage in the cloud. Over the past few years one might have been happy to have a CockroachDB cluster running across Availability Zones: such a cluster can survive the loss of a single AZ and continue to deliver data to applications. More recently a multi-region cluster was seen as the safer option: such a cluster could survive the loss of an entire region and continue to deliver a great experience.
Our clients build and deliver applications with sub-second latencies. Recent news events have moved focus to the risk presented by the loss of an entire cloud. It is now a practical reality that a single cloud can cease to function, even if that is for only a short period of time. We look beyond the cloud to multi-cloud: a CockroachDB cluster that is now cloud agostic.
Skupper is an OSI layer 7 service interconnect. It enables secure communication across Kubernetes clusters with no VPNs or special firewall rules. With Skupper, a CockroachDB cluster can span multiple cloud providers, data centers, and regions.
Working across clouds presents a unique challenge: enabling the nodes of a CockroachDB cluster to communicate with each other. Skupper provides a solution to this challenge with a Virtual Application Network (VAN) that simply and securely connects the nodes running in different network locations.
Skupper is an open source tool for creating VANs in Kubernetes. A VAN connects the CockroachDB nodes into a virtual network so that they can communicate with each other as if they were all running in the same site. Using Skupper, we can create a distributed CockroachDB cluster composed of nodes running in different Kubernetes clusters on different clouds.
Layer 7 application routers form the backbone of a VAN in the same way that conventional network routers form the backbone of a VPN. However, instead of routing IP packets between network endpoints, Layer 7 application routers route messages between application endpoints (called Layer 7 application addresses).
A Layer 7 application address represents an endpoint, or destination in the VAN. When a CockroachDB node sends a communication to an address, the Layer 7 application routers distribute the communication to a CockroachDB node in the VAN that has the same address.
In a Skupper network, each namespace contains a Skupper instance. When these Skupper instances connect, they continually share information about the CockroachDB nodes that each instance exposes. This means that each Skupper instance is always aware of every other CockroachDB node that has been exposed to the Skupper network, regardless of the namespace in which each CockroachDB node resides.
A Simple Multi-Cloud Deployment of CockroachDB
Connecting Cockroach Kubernetes clusters together using Skupper is very straightforward. The main thing to remember is that Skupper creates a proxy in the first cluster for each of the pods in the second cluster. This makes referencing each node in the cockroach start command very easy. The downside is that you have to ensure that each of your pods has a unique name.
By default each cockroach cluster will have a collection of pods named the same: cockroach-0, cockroach-1, and so on. We achieve uniqueness by adding a prefix to the name of each pod in the StatefulSet. So, in an AWS EKS StatefulSet the name would be, for example:
kind: StatefulSet metadata: name: cockroachdb-aws
Consequently in an EKS cluster the first pod will be named cockroach-aws-0. We would do something similar in GCP and so in a GKE cluster the first pod is named cockroach-gcp-0. The cockroach start command then joins nodes referenced as cockroach-aws-0 and cockroach-gcp-0.
In overview, the steps to creating a simple multi-cloud installation of CockroachDB across AWS and GCP are:
Create an EKS cluster in AWS.
Create a GKE cluster in GCP
In each Kubernetes cluster
Deploy a CockroachDB StatefulSet in your namespace (normally cockroachdb):
Remember to use a separate prefix for each StatefulSet.
The cockroach start command should include the identities of the nodes in both clusters.
Once your nodes are deployed, run a cockroach init command against the cluster
Install Skupper from http://www.skupper.io (we used version 0.5.1).
Initialise skupper (skupper init) in the namespace.
Check the status of your skupper instance (skupper status).
Expose the CockroachDB statefulset in this cluster (skupper expose statefulset <name> --headless --port 26257).
Link the two clusters together:
In one of the clusters create a token to be used by the other cluster for secure communication across the Skupper network (skupper token create).
Then link the two clusters together (skupper link create).
If you then forward port 8080 from any of the pods you will be able to run the CockroachDB DBConsole and see that the pods from your cluster do indeed reside in separate clouds. In the following example we used the prefix g1 for the CockroachDB cluster in EKS and g2 for the CockroachDB cluster in GCP. There are 3 CockroachDB nodes in EKS and 2 CockroachDB nodes in GKE.
The CockroachDB cluster on EKS was located in the US (us-east-2 in Ohio) and the CockroachDB cluster on GKE was located in Europe (europe-west-3 in Frankfurt). This gave rise to the following inter-node latencies:
You can see that using Skupper the latency between EKS and GKE nodes is about 104ms for traffic across two clouds and two continents!
Truly Cloud Agnostic - A Three Cloud Deployment
In order to be truly cloud agnostic we must deploy a CockroachDB cluster across all three main clouds: AWS, GCP and Azure. Not to mention that CockroachDB requires 3+ datacenters to be able to survive a datacenter failure. So we need to scale up and run at least 3 cockroachdb nodes in three different DCs.
For this environment, we set up an EKS cluster in AWS using us-west-2, a GKE cluster in GCP using us-central1, and an AKS cluster Azure using eastus.
Also, because we’re doing a multi-site deployment, we had to make some additional changes to the Statefulset config for CockroachDB so the nodes will announce their location properly to the rest of the nodes in the cluster. Using CockroachDB Node Map (https://www.cockroachlabs.com/docs/v21.1/enable-node-map.html) we see:
You’ll notice that those configs are running CockroachDB in secure mode - this means we’re going to have to create some TLS certificates for the database too.
mkdir certs mkdir my-safe-directory cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key cockroach cert create-node --certs-dir=certs --ca-key=my-safe-directory/ca.key localhost 127.0.0.1 cockroachdb-public cockroachdb-public.default cockroachdb-public.default.svc.cluster.local *.cockroachdb *.cockroachdb-internal-west *.cockroachdb-internal-central *.cockroachdb-internal-east *.cockroachdb.default *.cockroachdb-internal-west.cockroachdb.svc.cluster.local *.cockroachdb-internal-central.cockroachdb.svc.cluster.local *.cockroachdb-internal-east.cockroachdb.svc.cluster.local *.cockroachdb.default.svc.cluster.local
As per the two-cloud example above, you still need to initialize Skupper in each namespace on each Kubernetes cluster, but you’re also going to need to inject the TLS certificates you just generated into secrets in each namespace. It’ll look something like this for each cluster:
kubectl create secret generic cockroachdb.client.root --from-file=certs --namespace cockroachdb --context cockroachdb-skupper-aws.us-west-1.eksctl.io kubectl create secret generic cockroachdb.node --from-file=certs --namespace cockroachdb --context cockroachdb-skupper-aws.us-west-1.eksctl.io
You’re probably also going to want an enterprise license so you can try out the CockroachDB fine-grain placement controls - you can request a 30 day trial from the Cockroach Labs website https://www.cockroachlabs.com/get-cockroachdb/enterprise/.
From there, the only other difference between this environment and the previous is that you’re linking three clusters instead of two.
Reflections and Conclusions
It is very easy to set up a multi-cloud CockroachDB using Skupper. We have found the team at Skupper to be both very responsive and very supportive. During our testing we found a bug in 0.5.1 that was fixed less than a week later in their 0.5.3 release. It’s really great to see their commitment to making multi-cloud applications a reality. Their latest version as of writing is 0.7.0 which we can only assume is even better!
One of the things we learned as we scaled this up is there are certainly some performance gotchas with Skupper, particularly when we’re running the database in secure mode. We need to spend some more time with them to figure out the best way to reduce that overhead. One tip they did give us was to increase the resources for the Skupper router and the proxy pods, something like:
kubectl edit deployment skupper-router -n cockroachdb --context cockroachdb-skupper-aws.us-west-1.eksctl.io kubectl edit statefulset cockroachdb-internal-west-proxy -n cockroachdb --context cockroachdb-skupper-aws.us-west-1.eksctl.io
What’s really great is the kind of projects this type of technology unlocks. We did a whole demo that included chaos testing in one DC while running load against the other DCs all without incident - there’s a great video of it available on YouTube. And yes, we did use KubeDoom to do it. https://www.youtube.com/watch?v=_toFU3Wqfvo
All in all, while this is not quite ready for production usage today, we’re excited by the progress and are looking forward to continuing to work with the Skupper team.