Blog

Technical blog posts.

Don't mount disk on EC2
Roofline analysis for prefill and decoding
My Way of Understanding vLLM V1 Scheduling Algorithm
Kubernetes Scalability
How low disks space breaks envoy wasm cache and eventually makes http request routing fail (AIBrix LLM inference infra)
Analysis on Various Overload Control Systems and their Limitaions
Debugging Forever Terminating Pods in Kubernetes
Deploying modified server without build, push, pull, and restart
Debugging HTTP Connection Overhead
Debugging istiod failure: 'why is it so hard to find out disk pressure?'
DCQCN, RDMA, CXL
K8S cheat sheet
Intuition behind NVIDIA GPU architecture and CUDA programming model