Debugging istiod failure: ‘why is it so hard to find out disk pressure?’

How a simple disk space issue turned into a 3-hour debugging

I have been using Istio service mesh in a Kubernetes cluster for my research project. A service mesh is an application networking infrastructure layer that transparently manages all service-to-service communication in a microservices architecture, enabling traffic control (request routing, request rate limiting, etc.), security (like mutual TLS), and observability without requiring changes to application code. It has been pretty popular but it is definitely not easy to manage. How tricky and annoying is it? There are companies who makes millions of millions of dollars by providing service mesh as a service (Solo.io, Tetrate, Buoyant, HashiCorp, etc.). It adds complexity to the infrastructure with a sidecar layer, Istio gateway, and istiod control plane. This is the cost you pay for the nice network layer abstraction.

And I also paid a lot of cost (my time) to use it. This blog will cover one of them.

One day, while I was running experiments, the istiod control plane went down, and Istio ingress gateway pods were failing.

TLDR: The root cause was disk pressure on the Kubernetes node—the istiod pod was evicted due to low ephemeral storage, which caused a cascade failure of all ingress gateway pods that couldn’t connect to istiod. The debugging took 3 hours because Kubernetes doesn’t surface disk pressure issues clearly: kubectl get pods shows cryptic statuses like Completed, ContainerStatusUnknown, and Evicted without explaining why, and kubectl describe deployment provides no useful information. The fix required cleaning up Docker images to free disk space and manually deleting all evicted pods (since Kubernetes doesn’t automatically clean them up). The real issue: Kubernetes knows about disk pressure but buries this critical infrastructure information deep in kubectl describe node, making it unnecessarily difficult to diagnose what should be a simple problem.

Let me start it with the beautiful k8s pod status (kubectl get pods -n istio-system). What a great reliable system. maybe it is built to panic the user when something goes wrong. maybe it is design choice? (a new note on 2026-01-19: I was mad at the time this issue happened since it made me spend so many hours to figure out the root cause. K8S is great..)

TLDR; the root cause was

What I did


At the very first place, before I started debugging, I thought it would be istio-ingressgateway was overloaded and failed, especially because it actually happened. But this time, it was not istio-ingressgateway problem.

I started by checking the envoy proxy logs

warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_stream.h:190 StreamAggregatedResources gRPC config stream to xds-grpc closed since 354s ago: 14, connection error: desc = "transport: Error while dialing: dial tcp 10.97.219.77:15012: connect: connection refused

And checked what this ‘‘10.97.219.77:15012’’ ip address comes from and it was istiod svc’s cluster ip address.

k get svc -n istio-system
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                      AGE
istio-ingressgateway   LoadBalancer   10.100.183.27   <pending>     15021:31560/TCP,80:32462/TCP,443:32155/TCP   9d
istiod                 ClusterIP      10.97.219.77    <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP        9d

so, I checked istiod deployment and none of the pods were ready. Not just istiod deployment but also all the pods of istio-ingressgateway deployment were not ready. This is why I thought istio-ingressgateway was the root cause since istio-ingressgateway is the one that processes the incoming traffic to forward to upstream app services. And istiod is control plane which uses much less resources and it is not even in request critical path. That’s why istiod does not even have multiple replicas most of the time.

kubectl get deploy -n istio-system
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
istio-ingressgateway   0/10    10           0           9d
istiod                 0/1     1            0           9d

I described istiod deployment to see what was going on. Basically, it does not say why 0 pod is availalbe and 1/1 is unavailable. It just says “NewReplicaSetAvailable” and “MinimumReplicasUnavailable”. NICE. GREAT. You know what either kubernetes of service mesh should have clearly described why it is not available at this stage. It is not good that user should do more than this to figure out the cause of pod unavailability.

k describe deploy istiod -n istio-system
...
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  6m23s  deployment-controller  Scaled up replica set istiod-7fb964cc7b to 1
  Normal  ScalingReplicaSet  116s   deployment-controller  Scaled down replica set istiod-67c6566457 to 0 from 1

I wrapped around my head and thought, “okay, let’s check other k8s controller. what about repliacset controller? That’s what controls the pod replicas.”

k get rs -n istio-system
NAME                              DESIRED   CURRENT   READY   AGE
istio-ingressgateway-5fc67fbd74   0         0         0       9d
istio-ingressgateway-8676d66897   0         0         0       8d
istio-ingressgateway-86c4b5c6f    10        10        0       8d
istiod-67c6566457                 0         0         0       8d
istiod-7fb964cc7b                 1         1         0       7m23s
istiod-bc4584967                  0         0         0       9d
$ k describe rs istiod-7fb964cc7b -n istio-system
...

Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  8m35s  replicaset-controller  Created pod: istiod-7fb964cc7b-gqxz5
  Normal  SuccessfulCreate  3m36s  replicaset-controller  Created pod: istiod-7fb964cc7b-7r4z7

Great. Everything looks normal also in istiod’s ReplicaSet (rs).

Okay, it is fine, I never expected replicaset will give some useful information. Let’s check something more details than what kubectl describe command gives. Let’s check actual istiod’s pod log.

I have attached the entire raw output of kubectl logs istiod. It is very long and almost impossible to parse any useful information from it since it is too messy. It should be verbose by design since it is log and it is supposed to be comprehensive than to be concise and readable. But still it was hard to use for debugging. Still to let you feel how it is, here is the log.

Basically, a bunch of network conneciton failure.

Okay, so deployment description is not enough but pod log is too much. Let’s check kubectl describe pod

$ k describe pod istiod-7fb964cc7b-7r4z7 -n istio-system
...
Status:           Failed
Reason:           Evicted
Message:          The node was low on resource: ephemeral-storage. Threshold quantity: 10057513974, available: 9670764Ki.
...
Events:
  Type     Reason               Age   From               Message
  ----     ------               ----  ----               -------
  Warning  FailedScheduling     17m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
  Warning  FailedScheduling     12m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
  Normal   Scheduled            10m   default-scheduler  Successfully assigned istio-system/istiod-7fb964cc7b-7r4z7 to node0.bufferbloater.istio-pg0.clemson.cloudlab.us
  Normal   Pulling              10m   kubelet            Pulling image "docker.io/istio/pilot:1.20.3"
  Warning  Evicted              10m   kubelet            The node was low on resource: ephemeral-storage. Threshold quantity: 10057513974, available: 9670764Ki.
  Normal   Pulled               10m   kubelet            Successfully pulled image "docker.io/istio/pilot:1.20.3" in 5.863s (17.44s including waiting)
  Normal   Created              10m   kubelet            Created container discovery
  Normal   Started              10m   kubelet            Started container discovery
  Normal   Killing              10m   kubelet            Stopping container discovery
  Warning  ExceededGracePeriod  10m   kubelet            Container runtime did not kill the pod within specified grace period.

Nice, it is here. The istiod pod was evicted due to low ephemeral storage on the node. and istio-ingressgateway pods were not able to connect to istiod because the only istiod pod was unavailable. Istio-ingressgateway pods were not ready and terminated considered unhealthy pod and replicaset recreated the pods, and it repeats. The root cause was the ephemeral storage issue on the k8s node where istiod was running.

To confirm, I checked node 0’s status. The node appears healthy (STATUS: ready), yet no pods can be scheduled due to disk pressure. This discrepancy is problematic because users would expect a “ready” node to be schedulable. Since disk pressure directly affects pod schedulability, this critical information should be more prominently displayed beyond just kubectl describe node.

gangmuk@node0:~$ kubectl describe node node0.bufferbloater.istio-pg0.clemson.cloudlab.us
...
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 16 Jul 2024 04:59:58 +0000   Tue, 16 Jul 2024 04:59:58 +0000   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 25 Jul 2024 21:47:46 +0000   Tue, 16 Jul 2024 04:59:22 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  **DiskPressure         True    Thu, 25 Jul 2024 21:47:46 +0000   Thu, 25 Jul 2024 21:45:12 +0000   KubeletHasDiskPressure       kubelet has disk pressure**
  PIDPressure          False   Thu, 25 Jul 2024 21:47:46 +0000   Tue, 16 Jul 2024 04:59:22 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 25 Jul 2024 21:47:46 +0000   Tue, 16 Jul 2024 04:59:55 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
...
Events:
  Type     Reason                 Age                    From     Message
  ----     ------                 ----                   ----     -------
  Normal   NodeHasNoDiskPressure  50m (x19 over 3h8m)    kubelet  Node node0.bufferbloater.istio-pg0.clemson.cloudlab.us status is now: NodeHasNoDiskPressure
  Warning  EvictionThresholdMet   10m (x336 over 3h15m)  kubelet  Attempting to reclaim ephemeral-storage
  Warning  FreeDiskSpaceFailed    14s (x31 over 150m)    kubelet  (combined from similar events): Failed to garbage collect required amount of images. Attempted to free 3077055283 bytes, but only found 0 bytes eligible to free.

FYI, I installed k8s in baremetal machine, so k8s node is a baremetal machine not a virtual machine. And cloudlab machine disk partition where root dir is mounted is god damn small. why
.

high disk space pressure.

gangmuk@node0:~$ df -h
Filesystem                               Size  Used Avail Use% Mounted on
tmpfs                                     26G  2.9M   26G   1% /run
/dev/sda3                                 63G   51G  9.1G  85% /

But it seems there are still 9GB (15% of the disk) available. Why is it not enough? Ephemeral storage used by pods were almost out and kubelet eviction policy is set to evict pod when there is less than 15% disk space is available

It was because of kublet configuration regarding disk pressure situation. To find it out, we need to dump the kublet config. To do it, you run proxy and then curl the kubelet config endpoint.

kubectl proxy

Then, in another terminal, run the following command:

gangmuk@node0:~$ curl -X GET <node_name>/proxy/configz | jq . > proxy_config.txt

You can see there is eviction policy configured. It is saying that if imagefs.available is less than 15% of the disk, evict the pod. proxy_config.txt

{
  "kubeletconfig": {
    "enableServer": true,
    "staticPodPath": "/etc/kubernetes/manifests",
    ...
    "evictionHard": {
      "imagefs.available": "15%",
      "memory.available": "100Mi",
      "nodefs.available": "10%",
      "nodefs.inodesFree": "5%"
	    },
    "evictionPressureTransitionPeriod": "5m0s",
    ...
  }
}

Anything can easily consume 60GB. In my case, the major culprit was Docker images. I had accumulated a lot of unused Docker images over time, which were taking up significant disk space.

So the first thing I tried to make space on disk was to clean up unused Docker images.

docker system prune -a

Okay, WTF. Even now, none of the pods were coming back to ready state yet. W H Y

Whatever, let’s restart

kubectl rollout restart deployment istiod -n istio-system
kubectl rollout restart deployment istio-ingressgateway -n istio-system

BUT it didn’t still bring the pods back
 all pods were Completed, ContainerStatusUnknown, or Evicted.

WHAT THE HELL.

The key issue was Evicted Pods Are NOT Automatically Cleaned Up!!!!! Evicted pods persist until their count exceeds the –terminated-pod-gc-threshold parameter in kube-controller-manager (default is 12,500 pods), and they hang around with a status of “Failed” but reason of “Evicted” Stack OverflowSpacelift. I am not sure this is how it is supposed to be
 but anyway it is the current k8s behavior.

And then why kubectl rollout restart <deployment_name> didn’t resolve the issue again?

Rollout restart creates NEW ReplicaSets but doesn’t clean up old evicted pods from previous ReplicaSets When there are replicas in a deployment, evicted pods are typically not deleted automatically. And therefore, the hundreds of evicted/failed pods will still remain in the cluster taking up API server resources and potentially interfering with scheduling decisions. Even after scaling deployment to 0 replicas, evicted pods from old ReplicaSets don’t get cleaned up.

Why kubectl delete pods --all worked.

This command immediately removed ALL problematic pods—evicted, completed, failed, and unknown status pods. It forced the ReplicaSet controllers to create completely fresh pods without any historical baggage, effectively cleaning the slate so no old evicted pods were cluttering the namespace.

So, it finally worked when I ran

kubectl delete pods --all -n istio-system

MY STRONG OPINION about all this stuff: I think it is such a mess and unreliable. I would say that K8s knows what’s happening in very much details. It shouldn’t be this difficult to find the root cause of such simple issue (disk pressure). Particularly, the node disk space resourec is infra layer not application layer. User shouldn’t be bothered this much when K8s has all this information. K8s should automatically detect the root cause, notify the user, and provide a solution or even just fix it automatically.