Gangmuk Lim | Debugging HTTP Connection Overhead: When the Measurement Was the Problem

Intro

I have been working on LLM inference routing project. I built a new request routing stack on AIBrix. Specifically, the existing AIBrix gateway (envoy proxy) will call my routing agent service with HTTP call. And each component, AIBrix gateway and routing agent service are running in aibrix k8s infrastructure as pods (meaning microservice architectures). I observed high latency in my routing agent service call. To find the overhead source, I logged the times of every step. First, I needed to check if it is caused by HTTP connection overhead or the routing agent service’s routing logic computation has high overhead. It is serving infrastructure, so there will be multiple concurrent requests submitted to the system. It needs to process(making routing decision) them with low overhead. And based on my logging, it was showing high latency (~500ms) during http connection between AIBrix gateway and routing agent service. And I spent quite a lot of times on trying to reduce http connection overhead. But it was not the root cause… My measurement (time logging) was wrong… It was measuring connection time outside the HTTP client’s connection pool, creating artificial overhead that didn’t exist in the actual requests. Very frustrating. (And yes in 2025, obivously this stupid mistake was made with the help of AI hallucination… Essentially, I am the one who was stupid though.)

This is what I was measuring. I was trying to measure as detailed as possible. From DNS resolution to connection establishment to request sending to response receiving.

func SendRequestWithTiming(
    client *http.Client,
    url string,
    method string,
    reqBody []byte,
    headers map[string]string,
    requestID string,
    timeout time.Duration,
) ([]byte, TimingResult, error) {
    var result TimingResult
    
    // Parse URL to get hostname
    parsedURL, _ := url.Parse(url)
    hostName := parsedURL.Hostname()
    
    // Measurement 1: DNS Resolution
    dnsStart := time.Now()
    hosts, dnsErr := net.LookupHost(hostName)
    result.DNSTime = time.Since(dnsStart)

    // Measurement 2: Connection Test - THE PROBLEMATIC PART
    result.ConnectionTime = 0
    if len(hosts) > 0 {
        connStart := time.Now()
        port := "80"
        if strings.HasPrefix(url, "https://") {
            port = "443"
        }
        
        // This creates a new TCP connection which is separate from the connection pool
        conn, dialErr := net.DialTimeout("tcp", hosts[0]+":"+port, timeout/2)
        result.ConnectionTime = time.Since(connStart)
        
        if dialErr != nil {
            log.Printf("Connection test failed: %v", dialErr)
        } else if conn != nil {
            conn.Close()  // Immediately close this test connection
        }
    }
    
    // 3. Actual HTTP request. It uses connection pool, meaning it will reuse the existing connection. When reusing the existing connection, there is no overhead.
    req, reqErr := http.NewRequest(method, url, bytes.NewBuffer(reqBody))
    if reqErr != nil {
        return nil, result, reqErr
    }
    
    for key, value := range headers {
        req.Header.Set(key, value)
    }
    
    reqStart := time.Now()
    resp, err := client.Do(req)
    result.RequestTime = time.Since(reqStart)
    
    // ... rest of the request handling ...

When I was sending requests, this is what I saw:

I0514 09:09:32.662052       1 latency_prediction_based.go:620] requestID: 132 -  Timing breakdown took, DNS: 4ms, Conn: 500ms, Req: 0ms, Send: 31ms, Read: 0ms, Parse: 0ms, Total: 536ms

With both services running in the same Kubernetes cluster and all k8s nodes are in the same rack, 500ms connection overhead is definitely excessive and not normal. The problem must be in the software stack.

What I Was Actually Measuring (And Why It Was Wrong)

What net.DialTimeout() actually does at line 45 is …

conn, dialErr := net.DialTimeout("tcp", hosts[0]+":"+port, timeout/2)

What happens:

DNS resolution (if needed)
TCP 3-way handshake (SYN → SYN-ACK → ACK)
Returns a raw TCP connection (net.Conn)
Critical: This connection is NOT in http.Client’s connection pool

This creates a brand new TCP connection every single time I measure. For HTTPS endpoints, add another 200-300ms for TLS handshake on top. Total: 200-500ms.

However, on line 66, when I call client.Do(req), the http.Client uses a completely different connection from its connection pool. If connection pooling is working (which it was), http.Client reuses an existing connection with zero overhead.

So what was I measuring?

Line 45: Creating a throwaway connection just for timing → 500ms
Line 66: Using a pooled connection for the actual request → 0ms

I was measuring Connection A while optimizing for Connection B. They’re not the same connection!

What I should have done: Use net/http/httptrace.ClientTrace to instrument the actual connection that client.Do() uses:

trace := &httptrace.ClientTrace{
    GotConn: func(info httptrace.GotConnInfo) {
        // info.Reused tells you if connection was from the pool
        // true = 0ms overhead, false = new connection created
        result.ConnectionReused = info.Reused
        result.IdleTime = info.IdleTime
    },
    ConnectStart: func(network, addr string) {
        connectStart = time.Now()
    },
    ConnectDone: func(network, addr string, err error) {
        result.ConnectionTime = time.Since(connectStart)
    },
}
req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))

Simply, this is how to get the timing of the http connections used by the http client.

It looks very obvious but it was not because (obviousuly) I was using LLM to write time logging code saying “I want to measure detailed overhead including connection, request sending overhead ….”. (Shamelessly), I didn’t read the code thoroughly and when I took a glimpse of it, the code made sense unless you understand what net.DialTimeout() actually does. Additionally, it was successfully built and shows the number. So, this is how stupid performance debugging that didn’t exist started. (I used Claude 3.5 Sonnet)

Later part of the post is detailed story about how I did performance debugging and optimization for non-existent problem. It is more about sharing the lessons learned.

What happened next

I pasted my code and logs to the AI, asking, “I want to first understand what connection overhead is and how to solve it.”

The AI assistant explained connection overhead in detail—the time needed for TCP handshakes, TLS negotiation, and session establishment. It confidently analyzed my code and suggested several improvements to my HTTP transport configuration:

Increasing MaxIdleConns and MaxIdleConnsPerHost
Extending IdleConnTimeout
Implementing a more aggressive connection warmup strategy

But most critically, the AI endorsed and expanded upon my existing measurement, which used this function:

func SendRequestWithTiming(
    client *http.Client,
    url string,
    method string,
    reqBody []byte,
    headers map[string]string,
    requestID string,
    timeout time.Duration,
) ([]byte, TimingResult, error) {
    // ... other code ...

    // 1. DNS Resolution
    dnsStart := time.Now()
    hosts, dnsErr := net.LookupHost(hostName)
    result.DNSTime = time.Since(dnsStart)

    // 2. Connection Test - THIS IS THE PROBLEMATIC PART
    result.ConnectionTime = 0
    if len(hosts) > 0 {
        connStart := time.Now()
        // Extract port if included in URL
        port := "80"
        if strings.HasPrefix(url, "https://") {
            port = "443"
        }
        // ... port extraction logic ...

        conn, dialErr := net.DialTimeout("tcp", hosts[0]+":"+port, timeout/2)
        result.ConnectionTime = time.Since(connStart)

        if dialErr != nil {
            // Log error
        } else if conn != nil {
            conn.Close()
        }
    }

    // ... HTTP request code ...
}

The AI analyzed this code and even suggested enhancements, adding instrumentation to check if connections were being reused. I implemented these changes, but the problem persisted—we were still seeing 500ms connection times, even with ConnReuse: true in the logs.

The Rabbit Hole Got Deeper

Over several days, I spent hours implementing the AI’s suggestions:

I modified the connection pool settings
I implemented more aggressive warmup strategies
I added detailed logging
I increased timeouts
I experimented with HTTP/2 vs HTTP/1.1

Each time, I returned to the AI with my results, and each time it offered new theories and more sophisticated approaches. We discussed:

Network path analysis
Connection pipelining
DNS caching
Server-side TCP settings

The AI’s suggestions became increasingly complex, requiring significant code changes. I was spending more and more time chasing a performance issue that seemed stubborn and mysterious.

A Different Approach: Alternative Measurement Method

After lots of unproductive optimization attempts, I asked the AI for other ways to measure connection overhead. The AI suggested using Go’s httptrace package:

func SendRequestWithTiming(
    client *http.Client,
    url string,
    method string,
    reqBody []byte,
    headers map[string]string,
    requestID string,
    timeout time.Duration,
) ([]byte, TimingResult, error) {
    // ... initial setup ...
    
    // Variables for httptrace
    var dnsStart, dnsEnd, connectStart, connectEnd, tlsStart, tlsEnd time.Time
    
    // Create HTTP request
    req, reqErr := http.NewRequest(method, url, bytes.NewBuffer(reqBody))
    // ... error handling ...
    
    // Set up HTTP tracing
    trace := &httptrace.ClientTrace{
        DNSStart: func(info httptrace.DNSStartInfo) {
            dnsStart = time.Now()
            // Log DNS start
        },
        DNSDone: func(info httptrace.DNSDoneInfo) {
            dnsEnd = time.Now()
            // Log DNS completion
        },
        ConnectStart: func(network, addr string) {
            connectStart = time.Now()
            // Log connection start
        },
        ConnectDone: func(network, addr string, err error) {
            connectEnd = time.Now()
            // Log connection completion
        },
        GotConn: func(info httptrace.GotConnInfo) {
            // Log connection details
            result.ConnectionReused = info.Reused
        },
        // More trace points...
    }
    
    // Apply trace to request
    ctx := httptrace.WithClientTrace(req.Context(), trace)
    req = req.WithContext(ctx)
    
    // Send request and calculate timings...
}

I implemented this approach, ran the code, and was surprised by the results:

I0514 18:45:12.882154       1 latency_prediction_based.go:646] requestID: 451 -  Timing breakdown took, DNS: 0ms, Conn: 0ms, ConnReuse: true, Req: 0ms, Send: 18ms, Read: 2ms, Parse: 0ms, Total: 21ms

Connection time: 0ms. The 500ms overhead had completely disappeared.

The Realization (Ahhhhhh!!!! human beings are becoming lazy and passive…)

I shared these results with the AI, asking why there was such a discrepancy. The AI explained something that should have been obvious earlier:

“The key difference between your old version and your new version is that your original measurement technique was flawed. You were creating a separate TCP connection just for measurement, completely separate from the one that the HTTP client was using.”

It turned out that while I thought I was measuring the connection overhead of my HTTP requests, I was actually measuring the time to establish a new TCP connection that wasn’t even being used for the actual request. Meanwhile, the HTTP client was correctly reusing connections from its pool, resulting in near-zero connection times.

Simply put, I had spent days trying to optimize a non-existent problem.

How AI and I Got Confused Together

Looking back, I realized several factors led to this misdirection:

Uncritical acceptance: The AI examined my flawed measurement code and treated it as valid, even suggesting enhancements rather than questioning its fundamental approach
Plausible problem: The connection overhead issue seemed reasonable enough that neither I nor the AI questioned its existence
Expertise gap: I lacked the specific networking expertise to spot the flaw in the measurement approach, and relied on the AI’s apparent confidence
Reinforcing loop: As I reported back with continued issues, the AI offered increasingly complex solutions, reinforcing the idea that we were solving a real problem
Authoritative tone: The AI’s clear, confident explanations made me trust its analysis more than I should have

Lessons for Working with AI on Technical Problems

This routine debugging exercise taught me several valuable lessons about working with AI assistants:

Scrutinize methodology: Before implementing any optimization, verify that your measurement approach is actually measuring what you think it is—even when an AI endorses it.
Understand AI limitations: AI assistants don’t necessarily recognize conceptual flaws in your approach, especially when those flaws aren’t obvious syntax or logic errors.
Check assumptions regularly: If you’re spending too much time optimizing without seeing results, revisit your fundamental assumptions about the problem itself.
Use multiple measurement methods: Different measurement approaches provide valuable cross-checks. If they give significantly different results, investigate why before proceeding.
Be aware of overconfidence: AI assistants can appear authoritative even when working from incorrect premises. Their confident tone doesn’t always reflect accuracy.

The Technical Root Cause

For those interested in the technical details, the original measurement approach was flawed because:

We were creating a fresh TCP connection using net.DialTimeout() just for measurement
This connection was completely separate from the pooled connections the HTTP client used
While the HTTP client was efficiently reusing connections (0ms connection time), our measurement was repeatedly establishing new connections (500ms)
We were spending time optimizing the connection pool based on measurements from outside the pool

Lessons Learned…

We know LLM has hallucination due to fundamental reason. I am not going to complain about it. But I think it is wrong to argue that hallucination was the problem. And also personally I believe my lack of understanding in the LLM generated code was the problem. AI will be just more and more common (likely exponentially) and it is not right to expect users to understand generated code even if it was used by software engineers. That’s essentially the opposite of what we envision in the AI-ebiqutous world. LLM at the first place should have asked me if that’s what I want before doing if I didn’t provide enough context (prompt). So, if AI can have better ability to tell if it has enough context or not and also the ability to ask user the right clarification question when needed is going to be more critical. Probably it is not new concern but this ability in LLM seems like not getting enough attention.