Tons of OpenAPI errors

Hi,

I got the following errors flooding my system journal. I think the IP 10.101.213.69 refers to the metrics-server pod in my cluster.

Aug 14 11:23:46 l09853 k0s[441]: time="2023-08-14 11:23:46" level=info msg="E0814 11:23:46.339804     705 controller.go:116] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr                                                                                                                                                                                         Aug 14 11:23:46 l09853 k0s[441]: time="2023-08-14 11:23:46" level=info msg=", Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]" component=kube-apiserver stream=stderr                                                Aug 14 11:23:46 l09853 k0s[441]: time="2023-08-14 11:23:46" level=info msg="I0814 11:23:46.341048     705 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue." component=kube-apiserver stream=stderr                                                                                                                                                                                                                                                       Aug 14 11:23:50 l09853 k0s[441]: time="2023-08-14 11:23:50" level=info msg="E0814 11:23:50.340588     705 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.213.69:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.101.213.69:443/apis/metrics.k8s.io/v1beta1\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" component=kube-apiserver stream=stderr                                                                    Aug 14 11:23:55 l09853 k0s[441]: time="2023-08-14 11:23:55" level=info msg="E0814 11:23:55.347965     705 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.213.69:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.101.213.69:443/apis/metrics.k8s.io/v1beta1\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" component=kube-apiserver stream=stderr
Aug 14 11:23:55 l09853 k0s[441]: time="2023-08-14 11:23:55" level=info msg="I0814 11:23:55.396666     705 handler_discovery.go:325] DiscoveryManager: Failed to download discovery for kube-system/metrics-server:443: 503 error trying to reach service: EOF" component=kube-apiserver stream=stderr
Aug 14 11:23:55 l09853 k0s[441]: time="2023-08-14 11:23:55" level=info msg="I0814 11:23:55.396712     705 handler.go:232] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager" component=kube-apiserver stream=stderr

However, the log level of these messages is info. So, maybe I can ignore them?

But I hope there’s a better way to resolve it.

Thanks.

kube-apiserver is trying to connect to metrics-server but apparently that is failing. It connect it via the kube-system/metrics-server ClusterIP service.

Do you run plain controller node(s) (==no worker enable on the controller) in your cluster? If yes, then this is often a symptom of konnectivity agents on the workers not being able to connect to the konnectivity-server on the controllers.

Check the following docs for some hints on the config:
https://docs.k0sproject.io/stable/high-availability/
https://docs.k0sproject.io/stable/nllb/

No, I am using a single configuration. Anyway, after I reset the cluster a few times, the metrics server seems to be working.

Maybe it has something to do with kube-router. I remember in one experiment, I disabled it by setting metricsPort to 0

Thanks

It’s is common that when the cluster is booting up there are these errors until all pods etc. are properly up and running.

Metrics server registers itself as API extension on the api-server and based on our testing getting everything “ready” can take some minutes in some cases. The api extension is registered BEFORE the pods etc. are are properly running. I think API has some backoff policy when connecting/discovering the extensions and thus it takes a while when images are pulled etc. the first time at least.