Every now and then I find myself needing to remember how HTTP request timeouts work in Go, and how to configure them. This has changed over time as http.Client
gained the Timeout
option, the various Timeout
options you can set on the http.Transport
and its underlying net.Dialer
, and eventually the introduction of the context package in Go 1.7 that allows us to set a timeout/deadline on the request which in turn lead to the deprecation of http.Request.Cancel
.
This post aims to be practically correct, as in you can use the model here to reason about the lifetime of an HTTP request performed by the net/http
. But there might still be some slight nuance to it. This post is written with HTTP/1 and HTTP/2 in mind and should still apply to HTTP/3 as well.
If you notice an error in any of this, please do let me know. There’s a lot of moving pieces to all of this so it’s easy to misunderstand something or miss a particular branch that behaves differently.
Lifecycle of a request
Once you call http.Client.(Do|Get|Post|...)
, roughly the following things happen in order to establish a connection with the other side:
- Translate the hostname to an IP address (DNS lookup)
- Connect to the IP (TCP handshake)
- Agree on encryption method and parameters (TLS handshake)
- Send an HTTP request
- Receive a response
Some of these steps are optional. If you’re passing in an IP instead of a hostname, we can skip the DNS resolution. If you’re not using https, then the TLS handshake doesn’t take place, but HTTP/2 and up are encrypted by default.
In the case of QUIC, it skips the TCP handshake since it runs over UDP. Instead it performs a QUIC handshake. This includes negotiating connection encryption, so you can think of it as collapsing the TCP and TLS handshakes into one while doing it faster.
In the remainder of the post I might interchangeably use ’timeout’ and ‘deadline’. They’re effectively the same, representing a point in time in the future after which we no longer want things to happen. If you look at the implement of context.WithTimeout
, you’ll see that it creates a context that sets a deadline.
Timeout bonanza
When doing a request, the mental model you want to use is: a request will timeout based on whichever is smaller (timeout) or comes sooner (deadline):
- the
Timeout
onhttp.Client
- any timeout/deadline set on the
context.Context
passed to the request - a “more specific” timeout/deadline that controls a part of the request lifecycle, if that timeout is reached during that part of the lifecycle
This means that the Timeout
value on http.Client
is something you can always set, as the absolute maximum you expect a request to ever take. If the HTTP client is doing a request where the request’s context timeout/deadline happens earlier, then that one is used instead.
So, for example, in this case the request will timeout after 1 second, not 10:
c := &http.Client{Timeout: 10 * time.Second}
ctx, cancel := context.WithTimeout(context.Background(), 1 * time.Second)
defer cancel()
req, _ := http.NewRequestWithContext(ctx, http.MethodGet, "https://example.com", nil)
c.Do(req)
When you call Get
on an http.Client
which doesn’t let you pass in an http.Request
with a context, the timeout will be the one set on http.Client
. In the case of this example, that’s after 10 seconds. Remember that if you do http.Get
, you’re using http.DefaultClient
which doesn’t have a Timeout
configured at all. This is bad, but setting a default is deemed a backwards incompatible change by the Go team. Because of this, it’s strongly advised to always create your own http.Client
and set a timeout on it, or let the caller pass one in.
If instead you were to pass in an http.Request
with a context who’s timeout was 1 minute, the request would still timeout after 10s. The effective timeout is always the smaller one of the two. If you can guarantee that your request always has a timeout set, then you could set the Timeout
on http.Client
to 0. However, it’s probably a good idea to have some kind of global timeout, so instead set it to something big like 5 minutes.
The timeout on http.Client
or on the context.Context
of a request covers the full lifecycle of the request. However, it’s possible to customise the http.Client
underlying http.Transport
, which lets you set timeouts on specific parts of the request lifecycle:
TLSHandshakeTimeout
specifies how long the TLS handshake can takeResponseHeaderTimeout
specifies how long it should take to receive the first response byteExpectContinueTimeout
specifies how long the client will wait to receive an HTTP 100 in order to proceed sending the body (this is usually only used for large uploads so the server can indicate it won’t service the request)
In all of these cases, if the timeout on the request context is smaller than the timeout value specified here, then the context value is the effective timeout.
TCP handshake timeout
In order to control how long the TCP handshake can take, a custom net.Dialer
with a Timeout
can be configured on the Transport
. That looks like this:
&http.Client{
Transport: &http.Transport{
DialContext: (&net.Dialer{
Timeout: 5 * time.Second,
}).DialContext,
},
}
This means that if we make a request who’s own timeout is set to 30s, we’ll only spend up to 5s trying to perform the TCP handshake. If we don’t complete a handshake in 5s, the request is aborted and an error will be returned.
DNS timeout
Even though you can control how long you’re willing to wait on a DNS response when calling methods on net.Dialer
through the context you pass in, you can’t control this behaviour from an http.Client
or by customising the Resolver
on the http.Transport
. It gets more complicated too when the libc resolver is used instead of the native Go one.
Lets start with net.Resolver
. When calling any method like LookupHost
, you can pass in a context. If that context has a timeout, that’ll be respected so if the server takes too long to respond a call will fail. However, when the http.Client
eventually calls LookupHost
etc., the context will be the HTTP request context. There is no Timeout
on net.Resolver
to set a “more specific” timeout. This does mean the DNS lookup can’t take longer than the maximum time you’re willing to wait for the request to complete, but you can’t set it to something lower than that.
The net.Resolver
has a Dial
field itself, where we can pass in a custom net.Dialer.DialContext
with a Timeout
. It’s very similar to how you configure the dialer on http.Transport
:
&net.Resolver{
Dial: (&net.Dialer{Timeout: 1 * time.Second}).DialContext,
}
That timeout applies to how long it takes to connect to the resolver though, not how long it takes to do a DNS. This gets a little extra interesting in that when you’re talking over UDP this Timeout
has no effect, since you don’t establish a connection in that case. Of course, if you set the timeout to something like 1 * time.Nanosecond
it’ll still affect a UDP “dial” since the context will have expired before we get around to doing anything.
In many cases, Go will do DNS resolution through your libc, instead of using a Go native resolver. This also partially depends on how the binary was built. The Go native resolver can be picked by setting PreferGo
to true
, but this isn’t available on every platform. When DNS resolution happens through CGO, the net.Dialer
of the net.Resolver
is never used and so any timeout on it doesn’t have any effect. Go will call the C getaddrinfo
function instead. Its timeouts and retry behaviour are governed by two constants in resolv.h
of your libc, typically 5s and 2 times. You can sometimes tweak this by setting options
in resolv.conf
.
Not all timeouts matter every time
The way the timeouts stagger seems simple enough, but not every timeout applies every time. DNS lookup results may be cached so they can complete faster after the first time or not happen at all. Through HTTP persistent connections, a connection may be reused for up to IdleConnTimeout
, sidestepping certain parts of the request lifecycle.
Lets assume we do a request with a timeout set to 30s. The first time, we have to do the DNS resolution as well as the TCP and TLS handshakes. Lets say that takes 10s together, leaving us 20s for the rest of the HTTP exchange. The next request happens after the first has completed but before the IdleConnTimeout
has elapsed. In this case, that same request with a 30s timeout has almost the full 30s for the HTTP exchange. Even if a new TCP connection needs to be set up, if TLS session resumption is available the time spent on the TLS handshake will be much shorter than the first time.
Remember that in order to be able to benefit from connection reuse, you have to drain and Close
the http.Response.Body
. However, if the body is really big, you might not want to read it all and instead pay the price of establishing a new connection. To control this, you can use an io.LimitReader
:
defer func() {
_, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 4*1000*1000))
_ = resp.Body.Close()
}()
Conclusion
Now that you’ve read through all of this, you should have a mental model of how timeouts work in Go’s HTTP client. You know which timeouts exist, to which part of the request lifecycle they apply, and how you can configure and potentially control them at the request level.
For cancellation, the same mental model applies. A context that’s only cancellable can be thought of as having no timeout at all. In that case, the http.Request
will be bound by the timeout on http.Client
and the timeouts on the underlying http.Transport
, if any. The request can still be cancelled before the timeout elapses, by calling the cancel function that context.WithCancel
returned.
Without implementing a solution yourself, there currently is no way to specify a timeout for only the HTTP exchange, from the moment we start writing the headers to the moment we’re done receiving the body. If you want to be sure you have 30s for the actual HTTP exchange, you need to set the timeout a bit higher, probably 5s or so more.
This nuance is often lost in the way people report metrics, as it’s common to start a timer before doing the request and then calculate the elapsed time until a response is received. This is probably fine, as through connection reuse you’re amortising that cost so the data in general still gives you the right picture. You can get an exact picture by using a ClientTracer
from net/http/httptrace
, or some other form of request tracing like OpenTelemetry, assuming it hooks into all the right places. Also keep in mind that with HTTP/2, requests can be multiplexed over a single connection, meaning that compared to HTTP/1 the head of line blocking issue moves from HTTP to TCP. This too might be something you want to distinguish in your metrics.