Blog

Technical blog posts.

Don't mount disk on EC2
VLLM error and network configs
Roofline analysis for prefill and decoding
comparison between TP and PP in llm inference
My Way of Understanding vLLM V1 Scheduling Algorithm
GPU Communication Architecture: NCCL, NVSHMEM, and NVLink
Memory bandwidth and latency for KV load
Kubernetes Scalability
How low disks space breaks envoy wasm cache and eventually makes http request routing fail (AIBrix LLM inference infra)
Analysis on Various Overload Control Systems and their Limitaions
Debugging Forever Terminating Pods in Kubernetes
Deploying modified server without build, push, pull, and restart
Debugging HTTP Connection Overhead
Debugging istiod failure: 'why is it so hard to find out disk pressure?'
DCQCN, RDMA, CXL
K8S cheat sheet
Intuition behind NVIDIA GPU architecture and CUDA programming model