Additional Information for Managed Kubernetes
API Documentation
Customers are given access to the standard Kubernetes API. The documentation can be found in the official Kubernetes documentation under the corresponding installed version. Additionally, users have access to the APIs of the CloudControllerManager, the CSI, and CNI (cilium).
Best Practices
To prevent common issues that could affect the availability of your application, we have created a list of best practices that you should implement for your applications.
Use Replicas Instead of Simple Pods
To ensure that pods are restarted elsewhere, Pod resources should not be created directly. Instead, use Deployments, StatefulSets, or other Workload Resources with at least 2 replicas.
Due to the volatility of individual nodes, it is important that your applications are resilient to this behavior. teuto.net recommends following the best practices for this purpose.
Please verify this especially before updates to ensure a smooth process.
Resource Requests and Limits
For the Kubernetes scheduler to estimate whether a pod fits on a node, it compares the resource requests and limits with those of other pods on the node, as well as with the available resources of the node.
To ensure that all applications run as expected, requests and limits should be configured for each workload resource. More details can be found in the Kubernetes documentation “Managing Resources for Containers”.
PodDisruptionBudget
With the help of PodDisruptionBudgets, you can configure for an application how many replicas must be available at any given time. This ensures that, for example, during a Kubernetes upgrade when all nodes are replaced once, enough instances of the application continue to run.
If it is not possible to meaningfully fulfill a PodDisruptionBudget, for example if maxUnavailable: 0 is specified in the PodDisruptionBudget, the affected application will still be stopped after a timeout. This can lead to outages.
Provider Details
Cloud Integration
teuto.net uses OpenStack as the cloud foundation for Managed Kubernetes clusters. This enables the integration and easy use of several features available in your Kubernetes cluster.
Network
We equip the clusters with functional networking (CNI) and DNS. This allows your applications to communicate effortlessly both externally and within the cluster.
Load Balancer
We install the openstack-cloud-controller-manager which equips Services with type: LoadBalancer with an Octavia load balancer. These load balancers automatically receive a dedicated, externally accessible floating IP each.
Volumes
We install the cinder-csi-plugin and create the following appropriate StorageClasses:
teutostack-ssd(default)
This makes it possible to create and use PersistentVolumes through the use of PersistentVolumeClaims.
For more information on using PersistentVolumeClaims, please refer to the official documentation.
Proxy Cache
To reduce network traffic to frequently used container registries, teuto.net employs a proxy cache. This also helps to avoid rate limiting by individual registries. This proxy cache is configured automatically and works completely transparently. No image URLs need to be adjusted.
A proxy cache is set up for container images from the following (public) image repositories, among others:
docker.iogcr.ioghcr.iok8s.gcr.ionvcr.ioquay.ioregistry.gitlab.comregistry.k8s.ioregistry.opensource.zalan.doregistry.teuto.io
Volatility of Individual Nodes
Individual nodes (control and compute plane) can be replaced by us at any time. This happens, for example, to install updates to the operating system or the Kubernetes software being used, or to allow us to perform maintenance work on the underlying cloud infrastructure (hypervisor).
We use the normal Node drain process, which ensures that pods are stopped as configured and all PodDisruptionBudgets continue to be respected. This process has a timeout to prevent node rotation from entering an infinite loop due to faulty PodDisruptionBudgets.
MachineHealthChecks
To ensure the availability of the cluster and your applications, teuto.net uses so-called MachineHealthChecks. These continuously monitor the availability and function of a cluster’s nodes. If a node fails and does not recover within a certain timeframe, the management system will automatically replace the faulty node with a new one.
Update Procedure
teuto.net regularly installs updates on Managed Kubernetes clusters. For updates to the latest minor release, we wait at least for the second patch release of the newest minor release.
During updates, nodes are replaced one after another. A new node is always added to the cluster first before one of the existing nodes is replaced. In this way, your full resources are always available. teuto.net uses the normal Node drain process.
To perform updates smoothly, our best practices, especially the configuration of PodDisruptionBudgets, should be followed. Otherwise, outages or a non-graceful shutdown of the affected application may occur.
Limitations
Reserved Resources
For stable operation of the Kubernetes cluster, it is necessary for teuto.net to run additional software on each node, for example to mount storage volumes (see Volumes). For these purposes, only a portion of the node resources is freely available.
Known Error Messages
Quota Exceeded
Disk Quota
An exceeded disk quota typically manifests itself in a PersistentVolumeClaim getting stuck in Pending status while simultaneously showing a reference to ExceedsAvailableQuota when running describe on the PersistentVolumeClaim.
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
disk-quota-example Pending teutostack-hdd 25s
$ kubectl describe pvc disk-quota-example
Name: disk-quota-example
Namespace: default
StorageClass: teutostack-hdd
Status: Pending
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 70s (x6 over 102s) cinder.csi.openstack.org_openstack-cinder-csi-controllerplugin-dbc57c7b9-mtczf_ca72589f-2cce-4f2d-b7b2-fc556d0b740e failed to provision volume with StorageClass "teutostack-hdd": rpc error: code = Internal desc = CreateVolume failed with error Expected HTTP response code [202] when accessing [POST https://api.ffm3.teutostack.de:8776/v3/613bec9239ad4973accbfb252cb53725/volumes], but got 413 instead
{"overLimit": {"code": 413, "message": "VolumeSizeExceedsAvailableQuota: Requested volume or snapshot exceeds allowed gigabytes_Ceph-HDD quota. Requested 500G, quota is 1000G and 1000G has been consumed.", "retryAfter": "0"}}
If you encounter a quota limit, please contact us, we are happy to help.