How do HTTP servers figure out Content-Length?
HTTP servers determine Content-Length by calculating response body size. Small responses use a Content-Length header, while larger ones utilize chunked transfer encoding, allowing data to be sent in segments.
Read original articleHTTP servers determine the Content-Length of a response by calculating the size of the response body before sending it. In simple implementations, the server can write the response in parts, as demonstrated in a Go program that handles HTTP requests. When a response is small enough to fit into a buffer, the server can easily calculate its length and include it in the Content-Length header. However, if the response exceeds the buffer size, the server uses chunked transfer encoding, which allows it to send the response in smaller segments without needing to know the total length in advance. This method was introduced in HTTP/1.1 and is widely supported. The response includes hexadecimal numbers indicating the size of each chunk, allowing the client to reconstruct the full message. Additionally, chunked responses can include trailers, which are headers sent after the body, useful for scenarios like digital signatures. While HTTP/2 and HTTP/3 do not support chunked transfer encoding, they have their own mechanisms for streaming data. Understanding these underlying processes is crucial for developers working with HTTP in programming languages like Go.
- HTTP servers calculate Content-Length based on the response body size.
- Small responses can use a Content-Length header, while larger ones use chunked transfer encoding.
- Chunked transfer encoding allows sending data in segments without knowing the total length beforehand.
- Trailers can be included in chunked responses for additional metadata.
- HTTP/2 and HTTP/3 utilize different streaming mechanisms and do not support chunked transfer encoding.
Related
Trailer (As Opposite to HTTP Header)
The Trailer response header in HTTP allows senders to add extra fields at the end of chunked messages for metadata like integrity checks. TE header must be set to "trailers" to enable this feature, enhancing data transmission security.
Httpwtf?
HTTP has hidden features like cache directives, trailers for metadata, and 1XX codes. Websockets bypass CORS, X-* headers allow custom extensions. Despite quirks, HTTP is vital for client-server communication.
Why your website should be under 14kB in size
Websites should be under 14kB to optimize loading speed, influenced by the TCP slow start algorithm, especially important for high-latency connections. Minimizing unnecessary elements enhances user experience.
HTTP Trailer Header
The Trailer header in HTTP enables additional metadata in chunked messages, requiring the TE request header to be set to "trailers," while prohibiting certain headers like authentication and content-related ones.
Always support compressed response in an API service
Enabling compressed responses in API services reduces bandwidth costs and enhances user experience. Frameworks like Gin and Nginx facilitate gzip compression, making it a straightforward enhancement for web services.
- Many commenters share their experiences with different HTTP server implementations, noting variations in how Content-Length is handled.
- There is a discussion about the complexities of managing content length, especially with compression and chunked transfer encoding.
- Several users highlight the importance of correctly calculating Content-Length to avoid issues with browsers and data transmission.
- Some comments touch on the challenges of streaming large responses without buffering, emphasizing the need for efficient handling.
- There is a mention of the historical context of HTTP and its evolving complexities, particularly with newer protocols like HTTP/2.
In most HTTP server implementations from other languages I've worked with I recall having to either:
- explicitly define the Content-Length up-front (clients then usually don't like it if you send too little and servers don't like it if you send too much)
- have a single "write" operation with an object where the Content-Length can be figured out quite easily
- turn on chunking myself and handle the chunk writing myself
I don't recall having seen the kind of automatic chunking described in the article before (and I'm not too sure whether I'm a fan of it).
However, you better be right! I just found a bug in some really old code that was gzipping every response when it was appropriate (ie, asked for, textual, etc). But it was ignoring the content-length header! So, if it was set manually, it would then be wrong after compression. That caused insidious bugs for years. The fix, obviously, was to just delete that manual header if the stream was going to be compressed.
https://www.oreilly.com/library/view/high-performance-web/97...
Also in 2018, some fun where when downloading a file, browsers report bytes written to disk vs content-length, which is wildly out when you factor in gzip https://x.com/jaffathecake/status/996720156905820160
It may be better now but a huge number of libraries and frameworks would either include the terminating NULL byte in the count but not send it, or not include the terminator in the count but include it in the stream.
https://notes.benheater.com/books/web/page/multipart-forms-a...
Buffering can be appropriate for small responses; or at least convenient. But for bigger responses this can be error prone. If you do this right, you serve the first byte of the response to the user before you read the last byte from wherever you are reading (database, file system, S3, etc.). If you do it wrong, you might run out of memory. Or your user's request times out before you are ready to respond.
This is a thing that's gotten harder with non-blocking frameworks. Spring Boot in particular can be a PITA on this front if you use it with non-blocking IO. I had some fun figuring that out some years ago. Using Kotlin makes it slightly easier to deal with low level Spring internals (fluxes and what not).
Sometimes the right answer is that it's too expensive to figure out the content length, or a content hash. Whatever you do, you need to send the headers with that information before you send anything else. And if you need to read everything before you can calculate that information and send it, your choices are buffering or omitting that information.
e.g I drafted this a long time ago, because if you generate something live and send it in a streaming fashion, well you can't have progress reporting since you don't know the final size in bytes, even though server side you know how far you're into generating.
This was used for multiple things like generating CSV exports from a bunch of RDBM records, or compressed tarballs from a set of files, or a bunch of other silly things like generating sequences (Fibonacci, random integers, whatever...), that could take "a while" (as in, enough to be friendly and report progress).
https://github.com/lloeki/http-chunked-progress/blob/master/...
To me the more interesting question is how web server receive an incoming request. You want to be able to read the whole thing into a single buffer, but you don't know how long its going to be until you actually read some of it. I learned recently that libc has a way to "peek" at some data without removing it from the recv buffer..... I'm curious if this is ever used to optimize the receive process?
It's not. Like, hell no. That is so complex. Multiplexing, underlying TCP specifications, Server Push, Stream prioritization (vs priorization !), encryption (ALPN or NPN ?), extension like HSTS, CORS, WebDav or HLS, ...
It's a great protocol, nowhere near simple.
> Basically, it’s a text file that has some specific rules to make parsing it easier.
Nope, since HTTP/2 that is just a textual representation, not the real "on the wire" protocol. HTTP/2 is 10 now.
Related
Trailer (As Opposite to HTTP Header)
The Trailer response header in HTTP allows senders to add extra fields at the end of chunked messages for metadata like integrity checks. TE header must be set to "trailers" to enable this feature, enhancing data transmission security.
Httpwtf?
HTTP has hidden features like cache directives, trailers for metadata, and 1XX codes. Websockets bypass CORS, X-* headers allow custom extensions. Despite quirks, HTTP is vital for client-server communication.
Why your website should be under 14kB in size
Websites should be under 14kB to optimize loading speed, influenced by the TCP slow start algorithm, especially important for high-latency connections. Minimizing unnecessary elements enhances user experience.
HTTP Trailer Header
The Trailer header in HTTP enables additional metadata in chunked messages, requiring the TE request header to be set to "trailers," while prohibiting certain headers like authentication and content-related ones.
Always support compressed response in an API service
Enabling compressed responses in API services reduces bandwidth costs and enhances user experience. Frameworks like Gin and Nginx facilitate gzip compression, making it a straightforward enhancement for web services.