Kubernetes in the Wild - Envoy proxy is NOT ready


In this episode of Kubernetes in the Wild, we are delving into the world of service meshes and specifically Istio. Istio has been installed into a two-node Kubernetes cluster following the setup guide here, and a container has been deployed, but we are not able to access the container. The first port of call is to Isito Envoy sidecar container within the pod in question; its logs are checked using the following command

kubectl logs web-85bfdf78c7-nhjdw --container=istio-proxy

The output starts to reveal the problem:

2019-07-25T12:43:00.048749Z info Monitored certs: []string{"/etc/certs/cert-chain.pem", "/etc/certs/key.pem", "/etc/certs/root-cert.pem"}
2019-07-25T12:43:00.048764Z info PilotSAN []string(nil)
2019-07-25T12:43:00.048830Z info Opening status port 15020

2019-07-25T12:43:00.049637Z info Starting proxy agent
2019-07-25T12:43:00.049875Z info watching /etc/certs for changes
2019-07-25T12:43:00.049884Z info Received new config, resetting budget
2019-07-25T12:43:00.049888Z info Reconciling retry (budget 10)
2019-07-25T12:43:00.049897Z info Epoch 0 starting
2019-07-25T12:43:02.059251Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster web.default --service-node sidecar~10.244.1.37~web-85bfdf78c7-nhjdw.default~default.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --allow-unknown-fields -l warning --component-log-level misc:error --concurrency 2]
[2019-07-25 12:43:02.103][15][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, no healthy upstream
[2019-07-25 12:43:02.104][15][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:49] Unable to establish new stream
2019-07-25T12:43:02.204407Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-07-25 12:43:03.306][15][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-07-25T12:43:04.203438Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-07-25 12:43:05.658][15][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-07-25T12:43:06.203678Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-07-25T12:43:08.202934Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected

The error is pretty direct, and so our next step is to investigate the Istio Pilot pod. Using the below command, we can indeed see something is up with the Pilot pod.

kubectl get pods -n istio-system

Which outputs the below:

istio-system grafana-6575997f54-prdmq 1/1 Running 0 24h
istio-system istio-citadel-894d98c85-zz5z8 1/1 Running 0 24h
istio-system istio-galley-5b984f89b-sbbs6 1/1 Running 0 24h
istio-system istio-ingressgateway-6599d6749-6b5df 0/1 Running 0 24h
istio-system istio-init-crd-10-f2rkx 0/1 Completed 0 26h
istio-system istio-init-crd-11-xldll 0/1 Completed 0 26h
istio-system istio-init-crd-12-zn67s 0/1 Completed 0 26h
istio-system istio-pilot-7ccff5dbdc-v9klv 0/2 Pending 0 24h
istio-system istio-policy-77bbfdbd6-p8sh6 2/2 Running 1 24h
istio-system istio-sidecar-injector-7b98dd6bcc-p7txp 1/1 Running 0 24h
istio-system istio-telemetry-7f8d5c5b74-6scsb 2/2 Running 0 24h
istio-system prometheus-7d7b9f7844-586hm 1/1 Running 0 24h

The Pilot pod is currently pending; looking at the details of the Pilot pod reveals the source of the problem.

kubectl describe pods istio-pilot-7ccff5dbdc-v9klv -n istio-system

Which outputs the below:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m30s (x1031 over 24h) default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 2 Insufficient memory.

Now we have the root cause. Since this is a small lab deployment, the worker node has only the bare minimum CPU and memory allocations to run Kubernetes. Bumping them up enables the Pilot pod to move into a running state.

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m43s (x1039 over 24h) default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 2 Insufficient memory.
Warning FailedScheduling 73s (x5 over 3m) default-scheduler 0/2 nodes are available: 2 Insufficient memory.
Warning FailedScheduling 42s (x2 over 42s) default-scheduler 0/2 nodes are available: 1 Insufficient memory, 1 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 41s default-scheduler Successfully assigned istio-system/istio-pilot-7ccff5dbdc-v9klv to kubworker
Warning FailedMount 38s (x2 over 40s) kubelet, kubworker MountVolume.SetUp failed for volume "config-volume" : couldn't propagate object cache: timed out waiting for the condition
Warning FailedMount 36s (x3 over 40s) kubelet, kubworker MountVolume.SetUp failed for volume "istio-certs" : couldn't propagate object cache: timed out waiting for the condition
Warning FailedMount 36s (x3 over 40s) kubelet, kubworker MountVolume.SetUp failed for volume "istio-pilot-service-account-token-64fvq" : couldn't propagate object cache: timed out waiting for the condition
Normal Pulling 32s kubelet, kubworker Pulling image "docker.io/istio/pilot:1.2.2"
Normal Pulled 15s kubelet, kubworker Successfully pulled image "docker.io/istio/pilot:1.2.2"
Normal Created 15s kubelet, kubworker Created container discovery
Normal Started 14s kubelet, kubworker Started container discovery
Normal Pulled 14s kubelet, kubworker Container image "docker.io/istio/proxyv2:1.2.2" already present on machine
Normal Created 14s kubelet, kubworker Created container istio-proxy
Normal Started 14s kubelet, kubworker Started container istio-proxy

For reference, the worker node was provisioned with 4 vCPU's and 6 GB of memory to facilitate Istio starting correctly.