In the past I’ve been playing around with PKS (nowadays called TKGI) and TKGS (Tanzu Kubernetes Grid Service – Tanzu Natively integrated in vSphere 7.0). Today I wanted to play around with the plain TKG Multi-cloud (TKGm) in my vSphere 7.0 environment.
My setup looks like this:
- vCenter 7.0.2
- TKGm 1.4
- My Macbook as the local Bootstrap Machine running Docker Desktop 4.3.0 (71786)
I chose to work with a TKGm Management Cluster and not with the TKGS / vSphere Supervisor Cluster. VMware’s preferred way however is to use the Supervisor Cluster if you’re deploying TKGm on vSphere 7. Working with a TKGm Management Cluster on vSphere 7 is also supported though.
Problem & Troubleshooting
During the deployment of my TKGm Management Cluster, I was greeted with the following error:
Error: unable to set up management cluster: unable to create bootstrap cluster: failed to create kind cluster tkg-kind-c6nn3ndmk1u63k77o6b0: failed to init node with kubeadm: command “docker exec –privileged tkg-kind-c6nn3ndmk1u63k77o6b0-control-plane kubeadm init –skip-phases=preflight –config=/kind/kubeadm.conf –skip-token-print –v=6” failed with error: exit status 1
I did not have any clue as to what went wrong, so I tried deploying the Management Cluster with the highest logging level possible (9):
tanzu management-cluster create --file /Users/someUser/.config/tanzu/tkg/clusterconfigs/6p76tejulg.yaml -v 9
This gave me a ton of output; the error message was a bit more descriptive now:
an error has occurred:timed out waiting for the conditionThis error is likely caused by:- The kubelet is not running- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:- ‘systemctl status kubelet’- ‘journalctl -xeu kubelet’Additionally, a control plane component may have crashed or exited when started by the container runtime.To troubleshoot, list all containers using your preferred container runtimes CLI.Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:- ‘crictl –runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause’Once you have found the failing container, you can inspect its logs with:- ‘crictl –runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID’couldn’t initialize a Kubernetes clusterk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:114k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.
I noticed when you deploy a TKGm Management Cluster, the TKG cli pulls some images from VMware Registries and runs a Docker Container from it (projects.registry.vmware.com/tkg/kind/node:v1.21.2_vmware.1). In this Docker Container it seems that the Install Procedure wanted to setup the Bootstrap K8s cluster by running the following command as mentioned in the error message:
docker exec –privileged tkg-kind-c6nn3ndmk1u63k77o6b0-control-plane kubeadm init –skip-phases=preflight –config=/kind/kubeadm.conf –skip-token-print –v=6
So I opened a shell to the Container that the TKG Install Procedure created and checked the status of the Kubelet: I was greeted with the same error message regarding the ‘cgroups’ together with Containerd not being able to initialise a Kubernetes cluster. Commands I used:
systemctl status kubelet journalctl -xeu kubelet
Update 08/12/2021: It seems that the TKG KIND Images do not support cgroupsv2 for the moment. This issue is being tracked on Github here. Docker Desktop for Mac now uses cgroupsv2 in their latest release (4.3.0). This is also shown by running the ‘docker info | grep -i cgroup’ command.
So how can we solve this?
- Create your own K8s cluster and let Tanzu CLI use it during deployment
- Or revert to an older version of Docker Desktop for Mac
Create your own K8s cluster and let Tanzu CLI use it during deployment
The TKG cli also supports the Deployment of a Management Cluster by using another K8s cluster as a Bootstrap Cluster. Meaning TKG will not try to deploy a new Bootstrap Cluster, but will use your existing K8s cluster instead. So I did exactly that.
Make sure to install ‘kind’ if you do not already have it (MacOs command below):
brew install kind
Then simply create a K8s cluster with kind:
kind create cluster
And now tell the TKG cli to use your K8s cluster as the Bootstrap Cluster:
tanzu management-cluster create --file /Users/someUser/.config/tanzu/tkg/clusterconfigs/6p76tejulg.yaml --use-existing-bootstrap-cluster kind-kind -v 6
The TKG Install Procedure will now use your own locally created K8s cluster as the Bootstrap Cluster and install the necessary pods etc. on it.
How do you get this TKG cli command? Well if you used the UI option to deploy your TKGm Management Cluster, you can just copy-paste this command from the Review page or from your Failed Deployment Page. In the end you need to add the ‘–use-existing-bootstrap-cluster‘ option to it.
Woud you like to know more about kind? Follow this link here.
After the deployment completes you will have a fully running TKGm Management Cluster as it should be.
You can inspect your cluster by running the following command:
tanzu mc get
Revert your Docker Desktop for Mac to an earlier version
Earlier versions of Docker Desktop for Mac use cgroupsv1, reverting to an earlier version (e.g.: Docker Desktop 3.6.0) also solves the issue. After you’ve reverted to an earlier version, you can check the current cgroups settings in use by Docker with the following command ‘docker info | grep -i cgroup’:
Hope it helped! If you have any insight on this topic, don’t hesitate to let us know!
Have a nice day!