How to allow a worker to join a controllers that lives in a WSL environment?

Hi,

When I create the token for workers, the token uses my controller’s host IP. But, if the controller lives in a VM environment, the IP is the VM host’s IP, which is not publicly accessible. In my case, my controller lives in a WSL environment. The public IP is my Windows IP.

Is there a way to choose a different IP when generating the token?

Thanks

In the config spec.api.externalAddress is exactly for this, see Configuration Options - Documentation

Configures all cluster components to connect to this address and also configures this address for use when joining new nodes to the cluster.

So, all components will connect to the API server using my Windows host IP address, right?

I wonder how many firewall ports do I need to open… :joy: worth a try.

Thanks

Here’s my system configuration:

  • controller+worker: a WSL Linux in my Windows
  • worker: a Gentoo Linux
    • I set the os value to arch in the k0sctl config file

After applying the config file using k0sctl, the controller+worker node started usually. The worker node is in the NodeReady status, but the pods fail.

There are 4 pods:

  • kube-proxy: the status is running, but I cannot view the logs
  • coredns & konnectivity-agent: stuck at CreatingContainer
  • kube-router: stuck at CrashLoopBackOff

In the system log, I found many logs like these.

time="2023-08-26 13:20:02" level=info msg="E0826 13:20:02.131998    7255 kuberuntime_manager.go:1312] \"Failed to stop sandbox\" podSandboxID={Type:containerd ID:6186bb91c4bdd8f554f48b030a6334d8b13a047d4a43ba52a0568890f8f08a2c}" component=kubelet stream=stderr

time="2023-08-26 13:20:02" level=info msg="E0826 13:20:02.132121    7255 kubelet.go:1964] failed to \"KillPodSandbox\" for \"c494888c-a799-4a85-8cc2-2ff389c6f8b6\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"6186bb91c4bdd8f554f48b030a6334d8b13a047d4a43ba52a0568890f8f08a2c\\\": plugin type=\\\"bridge\\\" name=\\\"kubernetes\\\" failed (delete): no IP ranges specified\"" component=kubelet stream=stderr

time="2023-08-26 13:20:02" level=info msg="E0826 13:20:02.132200    7255 pod_workers.go:1294] \"Error syncing pod, skipping\" err=\"failed to \\\"KillPodSandbox\\\" for \\\"c494888c-a799-4a85-8cc2-2ff389c6f8b6\\\" with KillPodSandboxError: \\\"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\\\\\"6186bb91c4bdd8f554f48b030a6334d8b13a047d4a43ba52a0568890f8f08a2c\\\\\\\": plugin type=\\\\\\\"bridge\\\\\\\" name=\\\\\\\"kubernetes\\\\\\\" failed (delete): no IP ranges specified\\\"\" pod=\"kube-system/konnectivity-agent-bhg6d\" podUID=c494888c-a799-4a85-8cc2-2ff389c6f8b6" component=kubelet stream=stderr

Also, I cannot view any pods’ logs, not even those in the Running status. The all report No agent available.

The “No agent available” indicates that the konnectivity agents on the workers aren’t able to connect to the controller. Please check the relevant ports listed in the required ports and protocols section in the docs. From Gentoo you need to be able to connect to the API Server port (6443 by default), and the Konnectivity port (8132 by default) on the Windows/WSL controller.

I think the real problem is the kube-bridge interface cannot be brought up online. I have opened all the ports.

Later, I tried k3s, and everything worked. My WSL is the server+agent, and my Gentoo is the agent.

I also tried to install k0s on my Getoo with controller+worker, and it works well. So, I think when installing as worker only, some network related configuration are not done right.

Right. You mentioned that the kube-router pod is crash-looping. The konnectivity agent not being able to connect is probably not the root cause. Can you maybe inspect/share the kube-router pod logs? If they can’t be obtained via Kubernetes then you can find them directly on the worker node, try something along the lines of ls /var/log/containers/kube-router-*_kube-system_*. This should list the log files of the kube-router pod’s containers.

Other options you can try:

  • Disable konnectivity: Add --disable-components=konnectivity-server command line arg to the k0s controller. This might not fix the kube-router issue, but it may give you API access to the pod logs.
  • Replace kube-router with Calico: Specify spec.network.provider: calico in the k0s config file.

After playing with k3s for a few days, I gave up…

Even if I can get the agent to join the server, the bigger problem is pod-pod networking, which requires some kernel features that the WSL default kernel does not support.

Some posts suggest that building a customized WSL kernel could enable the required features. But I don’t have time for this.