Hello,
I’m strugling with installation of k0s on bare metal servers. I did few tests with VPS without any issue. On bare metal I’m having following set issues (I guess that having same base):
- cannot connect to corends pod using service (tested with
nc -v -z -w 3 10.96.0.10 53
and looks like depends on what pod I’m redirected)
- metrics server cannot reach metrics for some nodes (I can see a lot of messages like
E1225 22:05:32.032237 1 scraper.go:140] "Failed to scrape node" err="Get \"https://135.125.x9.x3:10250/metrics/resource\": context deadline exceeded" node="fra1"
and I’m not able to reach this port from node where metrics server pod is running)
I cannot see any error in konnectivity pods, k0scontroller or k0sworker. I’m getting to be out of any idea how to resolve it
Thank You for any advice.
Here is my k0sctl config file (some sensitive data were obfuscated):
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: k0s-ovh
spec:
hosts:
- ssh:
address: 1.2.3.4
user: xxxx
port: 22
keyPath: xxx
role: controller
- ssh:
address: 1.2.3.4
user: xxxx
port: 22
keyPath: xxx
role: worker
- ssh:
address: 1.2.3.4
user: xxxx
port: 22
keyPath: xxx
role: worker
- ssh:
address: 1.2.3.4
user: xxxx
port: 22
keyPath: xxx
role: worker
- ssh:
address: 1.2.3.4
user: xxxx
port: 22
keyPath: xxx
role: worker
k0s:
version: v1.28.4+k0s.0
dynamicConfig: true
config:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
name: k0s-ovh
spec:
api:
extraArgs:
service-node-port-range: "80-32767"
So from the worker node, you cannot connect to coreDNS?
My first check would be firewalls, do you have any firewall running on the nodes?
If you have, you should allow pod and service CIDRs on the firewall. And also enable some common ports, see more at Networking (CNI) - Documentation
Yes, neither pod (having 2) or svc IP:
# kubectl -n kube-system get pods,svc -o wide -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/coredns-85df575cdb-j4nxt 1/1 Running 0 25h 10.244.3.3 fra2 <none> <none>
pod/coredns-85df575cdb-pxdl7 1/1 Running 0 25h 10.244.1.6 fra1 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 25h k8s-app=kube-dns
################################################################
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-797kn 1/1 Running 0 3m34s 10.244.0.20 gra2 <none> <none>
test-d67c6 1/1 Running 0 3m34s 10.244.1.20 fra1 <none> <none>
test-ngz67 1/1 Running 0 3m34s 10.244.3.21 fra2 <none> <none>
test-tkh98 1/1 Running 0 3m34s 10.244.2.22 gra1 <none> <none>
################################################################
## problematic node:
# kubectl exec -it test-797kn -- sh
/ # nc -v -z -w 2 10.244.3.3 53
nc: 10.244.3.3 (10.244.3.3:53): Operation timed out
/ # nc -v -z -w 2 10.244.1.6 53
nc: 10.244.1.6 (10.244.1.6:53): Operation timed out
/ # nc -v -z -w 2 10.96.0.10 53
nc: 10.96.0.10 (10.96.0.10:53): Operation timed out
################################################################
## working node
# kubectl exec -it test-d67c6 -- sh
/ # nc -v -z -w 2 10.244.3.3 53
10.244.3.3 (10.244.3.3:53) open
/ # nc -v -z -w 2 10.244.1.6 53
10.244.1.6 (10.244.1.6:53) open
/ # nc -v -z -w 2 10.96.0.10 53
10.96.0.10 (10.96.0.10:53) open
exactly same situation directly from the nodes:
vojbarz@fra1 ~> nc -v -z -w 2 10.244.3.3 53
Connection to 10.244.3.3 53 port [tcp/domain] succeeded!
vojbarz@fra1 ~> nc -v -z -w 2 10.244.1.6 53
Connection to 10.244.1.6 53 port [tcp/domain] succeeded!
vojbarz@fra1 ~> nc -v -z -w 2 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
###########################################
vojbarz@gra2 ~> nc -v -z -w 2 10.244.3.3 53
nc: connect to 10.244.3.3 port 53 (tcp) timed out: Operation now in progress
vojbarz@gra2 ~ [1]> nc -v -z -w 2 10.244.1.6 53
nc: connect to 10.244.1.6 port 53 (tcp) timed out: Operation now in progress
vojbarz@gra2 ~ [1]> nc -v -z -w 2 10.96.0.10 53
nc: connect to 10.96.0.10 port 53 (tcp) timed out: Operation now in progress
Just curious, does it work from node gra1
? Both pods are running on nodes in the “fra” network. Maybe the issues you’re facing are connected to inter-networ-traffic between the “gra” and “fra” networks. What happens if the CoreDNS pods run on gra1
and gra2
? Does this break connectivity from the “fra” nodes?
no, the same as gra2.
Looks like there is some networking issue in gra zone. Guessing based on the fact that if I move pod to gra1, I’m able to reach it only from gra1, not from gra2 neither fra nodes.
Is there a way how to findout what is wrong? Standard networking (tcp/udp) works fine between nodes.
Does anybody know how to find out what is not working?
Hi Vojbarzz,
Can you please verify that nodes can communicate with nodes on the other zone on port TCP/179?
If they have connectivity probably we’ll need to do a traceroute to see the way the traffic goes and see where it gets lost. This can be tricky without actually acquiring tcpdumps…
Yes, all nodes can reach others on TCP/179 (tested using netcat)
I can run tcpdumps. Can You help me with the scenario how to and what to capture?