Deploying JupyterHub to Kubernetes via Kustomize using SOPS Secret Management

This article describes our go at deploying JupyterHub to Kubernetes via Kustomize from GitLab CI. To manage our Secrets and Credentials we use SOPS. The groundwork for deploying JupyterHub to Kubernetes was done by the people behind the Zero to JupyterHub with Kubernetes Project.

About the Jupyter Project:

Jupyter is an Open-Source Web-Application capable of letting users gather both live-code and equations as well as visualizations and documentation. It can be used for cleaning and transforming data, numeric simulations, statistic models, machine learning purposes and a lot more.

JupyterHub therefore is the multi-user Version of the Jupyter Notebook. It allows for spinning up the Jupyter application for each user. JupyterHub is perfectly designed for being used at companies, schools and research-labs. Every user gains access to a complete developing environment and ressources on shared hardware. Therefore the installation and maintenance workloads are kept away from the end user. The Hub will be managed by administrators instead.

Jupyterhub is customizable in a way, that allows for delivering pre configured environments to its users. These environments can comprise different libraries and frameworks. It has builtin support for different authentication methods, such as LDAP, OAuth or GitHub. Finally it is container-friendly so it’s a perfect fit for deployment to Kubernetes.

The Zero to JupyterHub with Kubernetes project does exactly that. Given you already have a running Kubernetes Cluster with Helm and Tiller installed, you can be up and running within minutes. You just define a couple of configuration decisions within a config file and pass it to the Helm installation command. The rest is magic.

But instead of letting Helm resolve all of the dependencies and have the privileged Tiller Service live inside our cluster to install our JupyterHub Application, we went the hard way of first converting the Helm Chart to Kustomize and looking at each specific ressource being created from there. We converted to Kustomize because we don’t need Helm to automatically resolve our dependencies and act as our package manager. Furthermore because of security concerns we don’t appreciate having the privileged Service Tiller on our cluster.

Introduction to Kustomize

A common structure for a Kustomize project would look like this:

├── base
│   ├── deployment.yaml
│   ├── kustomization.yaml
│   └── service.yaml
└── overlays
    ├── dev
    │   ├── kustomization.yaml
    │   └── patch.yaml
    ├── prod
    │   ├── kustomization.yaml
    │   └── patch.yaml
    └── staging
        ├── kustomization.yaml
        └── patch.yaml

The patch.yaml file located in the Overlay folder patches the chosen values, e.g. the number of Replicas defined by the Deployment.

kustomize build overlays/staging

This command evaluates overlays/staging/kustomization.yaml, collects the bases and uses them for generating the output which will be written to stdout. Additional Ressources defined by the overlays are then added or used to override the base values.

Kustomize Structure for our project

teuto-to-jupyterhub/
|---base/
|   |---kustomization.yaml
|   |---hub-config-values.yaml
|   |---hub-deploy.yaml
|   |---.dockerconf.sops.json
|   |---...
|---overlay/
    |---ldapt-auth/
        |---kustomization.yaml
        |---hub-config-values.yaml
        |---hub-deploy.yaml
        |---hub-secret-values.sops.yaml

This is how our JupyterHub project looks like. We use the base for the groundwork and the overlay folder for adding and manipulating ressources. In this case we add some LDAP-Authentication method using a custom overlay. You might notice a couple of files containing the letters “sops” inside their filenames. Those are related to secret management and will be discussed later on. Let’s have a look inside the base/kustomization.yaml, since it is the anchor Kustomize is looking for when we point it to its target.

commonLabels:
  app: jupyterhub

namespace: jhub-develop

resources:
- hub-deploy.yaml

configMapGenerator:
- files:
  - values.yaml=hub-config-values.yaml
  name: hub-config

secretGenerator:
- files:
  - proxy.token
  - hub.cookie-secret
  name: hub-secret
  type: Opaque
- name: hub-image-credentials
  type: kubernetes.io/dockerconfigjson
  kvSources:
  - name: sopsfiles
    pluginType: go
    args:
    - .dockerconfigjson=.dockerconf.sops.json

Please note that at this point there is no authentication-method implemented. This will be done using overlays.

commonLabels adds the specified labels to all ressources generated during kustomize build.

namespace adds the specified namespace to each and every ressource generated during kustomize build.

ressources collects all the ressource definitions located in the same directory as kustomization.yaml. You could go ahead and collect any type of Kubernetes ressource to be included in the kustomization process. After that there are two kinds of generators that we’ll discuss now.

configMapGenerator

A configMapGenerator is a way of producing multiple objects of type configMap. Each entry in the array will result in one configMap-object. In this example we only define one configMap-object. Its name is hub-config and it will contain a single key-value-pair. The name of the key will be values.yaml since it is written to the left hand side of the equal sign. The content of the file on the right hand side of the equal sign will make for our value.

secretGenerator

secretGenerator behaves very similarly. In our case we define two Secrets. Please note that the latter Secret that will be created retrieves its value from a file named .dockerconf.sops.json. This file contains our credentials for accessing our private Docker registry. Obviously we can’t have those credentials just lying around in our remote GitLab repository without encrypting it first. This is where SOPS comes into play and this is also where we need to create a bridge for being able to use SOPS in conjunction with Kustomize.

Kustomize & Secrets

Kustomize offers the opportunity to include go-plugins for our secretGenerator. Therefore we are able to call any Secret Management Tool we like to decrypt our secret values directly from Kustomize. In our case we wanted to try out Mozilla SOPS.

SOPS

SOPS is an Editor for encrypted files. It supports YAML, JSON, ENV, INI and binary formats. It does work well for AWS KMS, GCP KMS, Azure Key Vault and also for PGP, which is what we’ll be using.

To encrypt a key-value file using SOPS we need to know the fingerprint of the public key that our secret file will be decrypted with. Once we know it, we insert it for $FP like so:

sops -e --pgp $FP myfile.yaml > myfile.sops.yaml

To decrypt our file we of course need the appropriate private key:

sops -d myfile.sops.yaml > myfile.yaml

Kustomize Plugin for SOPS

You can look at the Kustomize-plugin we used right here on GitHub: https://github.com/goabout/goabout-kustomize-plugins

Just clone the repository and follow the instructions given. After that we are able to call our plugin the way we did up there inside our kustomization.yaml.

Glue it all together

Our GitLab CI (Continuous Integration) script defines three stages:

image: docker:latest
services:
  - docker:dind

stages:
  - build_jupyter_hub_image
  - deploy
  - destroy

This last stage is optional and to be triggered manually. Before we can run this CI Pipeline, we of course have to encrypt our secret files and place them inside of our remote GitLab repository. The private key needed for decrypting our secret files needs to be made available as an CI environment variable with the name $GPG_PRIV_KEY.

We briefly define some variables to minimize having to repeat ourselves too often like so:

variables:
  HUB_IMAGE: registry-gitlab.teuto.net/t2jupyterhub/t2jupyterhub-hub:latest
  K8S_CONFIG_FILE: /dev/shm/kubeconfig

Sidenote: For being able to access our Kubernetes Cluster from inside of our CI-Runner, we have to make our decrypted kube config file available to kubectl. If our runner for some reason only dies halfway it could leave behind secret information. To prevent this we are saving our decrypted kube config to /dev/shm.

Now we define our hidden base-job that imports our kube-credentials. What makes it hidden is the dot in front of its name. This also prevents the script here from being outputted to stdout in our CI-Runners Console:

.import_kube_credentials:
  before_script:
    - gpg --import <(echo "$GPG_PRIV_KEY" | tr -d 'r')
    - gpg --list-secret-keys|grep "^sec" -A1 
    - | tail -1 
    - | awk '{print $1":6:"}' 
    - | gpg --import-ownertrust
    - sops -d kubeconfig.sops > /dev/shm/kubeconfig

This job can be used to override another jobs before_script. Its purpose is to import our $GPG_PRIV_KEY environment variable as our private key, trust it and then use it to decrypt our secret kube config so our jobs can access our Kubernetes Cluster.

build_dev:
  image: registry-gitlab.teuto.net/docker-base-images/custom-docker-container/cloud-tools:latest
  stage: build_jupyter_hub_image
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
    - docker build --pull -t $HUB_IMAGE images/hub/
    - docker push $HUB_IMAGE
  only:
    changes:
      - images/hub/*
    refs:
      - develop

In case we made changes to our Single User Image, this job is run to build our new image and send it to our private registry. Our registry credentials are made available through the CI environment variables $CI_REGISTRY_USER, $CI_REGISTRY_PASSWORD and $CI_REGISTRY that are available with every CI-Run.

Finally the moment you might’ve been waiting for – The usage of Kustomize:

deploy_dev:
  stage: deploy
  image: registry-gitlab.teuto.net/kubernetes/k8s-ci-image:kustomize-sops-plugin
  environment:
    name: develop
  script:
    - printf "%s" "$(openssl rand -hex 32)" >> ./base/hub.cookie-secret
    - printf "%s" "$(openssl rand -hex 32)" >> ./base/proxy.token
    - kustomize build ./overlays/ldap-auth --enable_alpha_goplugins_accept_panic_risk | kubectl --kubeconfig=${K8S_CONFIG_FILE} apply -f -
  only:
    - develop
  extends: .import_kube_credentials

This job extends .import_kube_credentials so our kube config is made available inside of /dev/shm. Then we define the image that shall be used for executing this job. This image contains a bunch of needed tools such as kubectl, Kustomize and the SOPS plugin for Kustomize. The last line of the script section is where Kustomize does it’s magic. We provide it with our overlays/ldap-auth folder as its target and enable the (still alpha) usage of go-plugins for Kustomize. After that we pipe the Output of Kustomize into kubectl, which sends our deployment to our Kubernetes Cluster.

As a result we have a fully running JupyterHub Application available on our Kubernetes Cluster using the LDAP-Server we defined for authentication inside of our overlay.

We’d love to hear what you think about handling Secret Management and Kustomize this way. If you have any questions or suggestions, please do not hesitate to contact us. We’d love to get in touch with you!