Deploying Rancher clusters

Update January 5th 2023

We all get older and wiser, and although the below procedure works, a co-worker asked me: “Why not just use the cloud init image?” Information and downloads can be found here.

  • Grab the OVA
  • Deploy the OVA to vSphere
  • Mark it as a template

The rest of the article continues…

After a long while of playing with templates, I finally have a working configuration that I am documenting to ensure that I don’t forget what I did.

Step 1: packer

In trying to get a usable image, I ended up using packer following this tutorial: https://github.com/vmware-samples/packer-examples-for-vsphere. No dice, so after ensuring I had all of the packages added from here: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/launch-kubernetes-with-rancher/use-new-nodes-in-an-infra-provider/vsphere/create-a-vm-template, the only missing packages were the growpart.

I tried prepping the template from the above, but ended up using the following script: https://github.com/David-VTUK/Rancher-Packer/blob/main/vSphere/ubuntu_2204/script.sh

# Apply updates and cleanup Apt cache

apt-get update ; apt-get -y dist-upgrade
apt-get -y autoremove
apt-get -y clean
# apt-get install docker.io -y

# Disable swap - generally recommended for K8s, but otherwise enable it for other workloads
echo "Disabling Swap"
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Reset the machine-id value. This has known to cause issues with DHCP
#
echo "Reset Machine-ID"
truncate -s 0 /etc/machine-id
rm /var/lib/dbus/machine-id
ln -s /etc/machine-id /var/lib/dbus/machine-id

# Reset any existing cloud-init state
#
echo "Reset Cloud-Init"
rm /etc/cloud/cloud.cfg.d/*.cfg
cloud-init clean -s -l

and I was off to the races… to only hit another problem.

Troubleshooting

I found the following reddit thread that was rather helpful: https://www.reddit.com/r/rancher/comments/tfxnzr/cluster_creation_works_in_rke_but_not_rke2/

export KUBECONFIG=/etc/rancher/rke2/rke2.yaml; export PATH=$PATH:/var/lib/rancher/rke2/bin
kubectl get pods -n cattle-system
kubectl logs <cattle-cluster-agent-pod> -n cattle-system

The above describes an easy way to test nodes that are coming up. Keep in mind that RKE2 turns up in a very different way than RKE. After the cloud-init stage, RKE2 binaries and containerd are deployed. It is helpful to be able to monitor pods that are coming up that control agents.

The last issue I encountered was that my /var filesystem didn’t have enough space. After fixing my template I now have a running RKE2 cluster!