September 14th, 2024

Falsehoods programmers believe about TCP

The discussion highlights the effectiveness of NetworkManager versus `wpa_supplicant` for wireless connections, misconceptions about TCP reliability, and the complexities of achieving consensus over unreliable links, impacting issues like buffer bloat.

Read original article

FrustrationConfusionDisagreement

Falsehoods programmers believe about TCP

The discussion revolves around the use of NetworkManager versus networkd for network management, particularly in the context of wireless connectivity issues. A user shared their experience of switching from NetworkManager to `wpa_supplicant` due to unreliable wireless connections, which led to applications misinterpreting network status during packet loss. This prompted a broader commentary on misconceptions about TCP reliability, highlighting several common false beliefs programmers hold regarding TCP's behavior and reliability. The conversation also touched on the challenges of achieving consensus over unreliable links and the limitations of TCP in ensuring message delivery. Another participant argued that while it is difficult to guarantee consensus on all bytes transferred, it is possible to agree on a subset of bytes that have been successfully received. The discussion concluded with a note on how misconceptions about network protocols can lead to issues like buffer bloat, particularly when router manufacturers overlook these complexities.

- The debate centers on the effectiveness of NetworkManager versus `wpa_supplicant` for managing wireless connections.

- Misconceptions about TCP reliability and behavior are highlighted, indicating a need for better understanding among developers.

- Achieving consensus on data transfer over unreliable links is complex, with some bytes being reliably acknowledged while others remain ambiguous.

- The conversation underscores the impact of network protocol misunderstandings on real-world issues like buffer bloat.

Timeliness without datagrams using QUIC

The debate between TCP and UDP for internet applications emphasizes reliability and timeliness. UDP suits real-time scenarios like video streaming, while QUIC with congestion control mechanisms ensures efficient media delivery.

Beyond bufferbloat: End-to-end congestion control cannot avoid latency spikes

End-to-end congestion control methods like TCP and QUIC face challenges in preventing latency spikes, especially in dynamic networks like Wi-Fi and 5G. Suggestions include anticipating capacity changes and prioritizing latency-sensitive traffic for a reliable low-latency internet.

Golang is evil on shitty networks (2022)

The impact of Golang's default setting disabling Nagle's algorithm on network performance is discussed. Concerns include slow uploads, increased latency, and network saturation, questioning the decision's efficiency and suggesting considerations for optimization.

Private Internet

The article highlights the inadequacies of current internet protocols regarding security and privacy, advocating for a new protocol with features like non-sensitive addresses and DoS resistance, while suggesting onion routing.

I wish (Linux) WireGuard had a simple way to restrict peer public IPs

Chris Siebenmann highlights WireGuard's limitations in restricting peer public IP addresses, suggesting the need for multiple ports or interfaces for security, and proposes potential kernel-level extensions for better peer management.

AI: What people are saying

The comments reflect a mix of confusion, critique, and insights regarding the article's claims about TCP and network reliability.

Several commenters express frustration with the article's lack of clarity and depth, particularly regarding the "falsehoods" it presents.
There is a discussion about the nature of TCP packets and the misconceptions surrounding them, with some arguing that the article's statements are contradictory.
Commenters highlight the importance of error correction and the complexities of network communication, especially in unreliable environments.
Real-world examples are shared, illustrating the challenges of message delivery over flaky connections.
Some suggest that the article oversimplifies or misrepresents technical concepts, leading to misunderstandings among readers.

33 comments

By @koala_man - 7 months

I find this "falsehoods programmers believe" format of making pointed claims that you intentionally don't clarify to be unhelpful and obnoxious

By @saghm - 7 months

> remember, all of the following statements are false at least some of the time, but for some of these, perhaps not very often

> 5. There is a such thing as a TCP packet

> 6. There is no such thing as a TCP packet

I don't understand this at all. Either the concept of a TCP packet exists, or the concept does not exist. Even it's not being used in certain scenarios, I don't see how you can argue that "there's no such thing" any of the time. This might just be me misunderstanding whatever point they're trying to make, but I don't remember ever having such philosophical confusion from anything in any other "falsehoods programmers believe about..." article before.

By @hinkley - 7 months

I recall it blew my fiancée’s mind that I could unplug her ethernet cable, move it around an obstacle, plug it back in and all her connections were still alive. It’s designed to have bombs dropped on it.

By @solatic - 7 months

Related: you can get at most once delivery or at least once delivery; you cannot get exactly once delivery. If I had a dollar for every junior who thought that a lack of exactly once delivery guarantees was a bug...

By @ijustlovemath - 7 months

Has this author never heard of error correcting codes? The whole point of them is to assume there's lossiness and add bytes to allow correction (or at least detection) of tampered or missing bytes. That's why TCP (or maybe it's Ethernet?) frames include FEC bytes in their message format.

Additionally, I'm sure they're aware that HTTP over TLS has encrypted data frames, which would be unreceivable in a lot of cases if these situations arose a bunch. And considering how much of the modern Internet is built on this paradigm, I think that many of these points are rare and probably extremely pedantic.

This is coming from someone who agrees with much of the nuance implied (but not explained!) by the post.

All great technical writing (which I assume these clickbait articles are at least attempting to be) is written with mutual discovery and deeper understanding in mind, and if you leave no actual explanation in the post, you can't really achieve either of those.

By @nirui - 7 months

> If the connection breaks while an ACK is outstanding, the sender will have no way of knowing whether the segment was received

The real question is, why this should be a problem that TCP must solve? TCP gives you a bidirectional waterflow-like pipe, and that's enough for you to create many useful applications. TCP never provided guarantee for correct delivery, that's your job.

For example, if a HTTP request is interrupted before the respond is received, the sender should assume the request never reach the server and try again with a new connection, while the server should mitigate duplicated requests (reject or return a successful code).

Well, maybe that's the point of the article, because many web pages gets confused if you send duplicated requests to them.

By @hamilyon2 - 7 months

I'll go out on a limb: inside datacenter on your own hardware, you can safely ignore low-level pedantry and mostly ignore “weird networks” and use TCP as two-way Unix pipe.

“Mostly” because you still care about bandwidth limits and packet RPS limits and latency of course.

By @dasyatidprime - 7 months

To some of the critics here: did you or did you not notice the “Somebody ought to write one of those […] Here, I'll even get the ball rolling” framing? A polished such article this is not claiming itself to be! I would go as far as saying the HN submission title is misleading as a result.

By @grishka - 7 months

This reminds me of a very particular problem that we tried to solve when I worked at VKontakte. It was about instant messaging and flaky mobile data connections.

The problem: you're on a subway train and you send a message as it departs a station. The request does get to the server, but by the time the response arrives, the train is already in the tunnel and you don't have a signal any more. So the client thinks that the message failed to send, but it was, in fact, sent successfully. The client would retry when it's back online, and would send another copy of that message.

The solution was to send a client-generated "random ID" with each request. I much later learned that this is conventionally called an "idempotency token". This worked, except there was now another problem: you sometimes receive your own message over the long-polling thing before the response to the request that sent it. You don't know for sure whether it's the message you just sent, or something else sent by a different client on the same account, because you don't know the ID of your message yet. This was solved by me delaying the processing of outgoing messages on the client side until all outstanding messages are fully sent and their IDs are known.

Telegram solved this much more elegantly: when the client reconnects to the server, the server sends it all the responses that were not acknowledged during the previous connection. MTProto has its own acknowledgement mechanism in addition to TCP's.

So yeah, instant messaging seems trivial at the first glance, but it turns out that TCP is a leaky enough abstraction that you need to somehow plug those leaks at the application level.

By @dtaht - 7 months

It really is astounding to me how so many still do not understand that tcp is not a function call, or behaviors like slow start and congestion avoidance.

Recently a new rate limiter for TCP went by that was so terribly, terribly broken, and I cannot help but imagine that most of the containers of the world suffer from Bufferbloat in general.

By @LtWorf - 7 months

Please note that it's not an article on lwn.net, it's a comment written by someone.

By @paulddraper - 7 months

> There is a such thing as a TCP packet

In what way is that a falsehood?

By @KaiserPro - 7 months

So TCP has slow start, and exponential fall off and shit.

but you can get round that in a lot of cases by just having a load of TCP connections in parallel.

TCP is cheap and well optimised, especially if you are keeping a bunch of connections open. (opening can be expensive)

so if you have a high latency connection, or a bit of packet loss, and you want to reach line speed without having to figure out cornercases with UDP, just open up 100-1k TCP connections and multiplex them.

bish bash bosh, mostly line speed over a high latency line (mind you this was in the days of 100m-500m cross atlantic internet, you'll probably need more connections to saturate a 10gig line.)

By @richm44 - 7 months

1. A SYN will receive a SYN-ACK or a RST 2. A host from my machine is the same as from your machine 3. An IP from my machine is the same as from your machine

By @poorman - 7 months

> Explainer for 1-4: https://en.wikipedia.org/wiki/Two_Generals%27_Problem. TL;DR: If the connection breaks while an ACK is outstanding, the sender will have no way of knowing whether the segment was received, and this turns out to be an insoluble problem no matter how much complexity you pile on top of it. You need something resembling Paxos or Raft to get a guarantee like that

The hashgraph algorithm is pretty sweet too and doesn't have the issue of a single write leader like Paxos and Raft. Basically multi-writers / leaderless

https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf

But to be fair, I'm not certain that CAP theorem and partition tolerance really belong in a conversation about TCP anyway

By @halayli - 7 months

This post is meaningless without clearly defining what reliable means.

Regarding ack not being received by sender when connection breaks, it's a weak and dishonest argument thinking it will strengthen their position, but completely ignoring the fact that TCP reliability is dependent on the simple and obvious fact that the connection exists!

By @derefr - 7 months

> 11. This is all low-level pedantry. I can think of TCP like a two-way Unix pipe that goes over the network, and completely ignore how it is implemented.

I mean, that's true, insofar as pipes have incredibly weak guarantees too — after all, the other end of a pipe might be a program reading from/writing to a network socket, or other unreliable transport. Whenever you let your program be plugged into an arbitrary pipe, you have to expect all that same flakiness and then some.

By @kranuck - 7 months

> 11. This is all low-level pedantry

Yeah pretty much.

maybe don't write contradictory unexplained nonsense.

By @peter_d_sherman - 7 months

>"7. If we fail to connect to a well-known remote host, then we must be offline."

Now that is a very interesting one!

It's sort of related to the question:

"How much of the Internet is accessible from any given point (location, locality, etc.) at any given point of time?"

Which is sort of unknowable, at least, without attempting to connect with every possible connection point on the Internet, which (if it could be done) would still consist of a range of time, and every point in time following that point would bring changes, perhaps small relative to the whole -- but accruing over time -- more and more, as more time elapses...

Observation: That same (or possibly similar!) phenomena would seem to be at play with respect to the measurment (observation) of quantum systems, i.e., the more certain you are of position, the less certain you are of velocity, and vice-versa...

Well, the more you measure the connectivity to all points of the Internet at one point in time, the less certain you might be of the state of the entire system as more time elapses from that point in time...

But now, why?

Observation: Generally speaking, the larger a system is, the more degrees of freedom it has, in attempting to "lock down" (know by observation, be "certain" of) the entire state of that system at one point of time, the more the parts of the system with degrees of freedom (how many degrees of freedom does the entire Internet have?) will change/evolve/move/"be subject to change" as more time evolves the state of the system... in other words, if you can know position (instantaneous state) with certainty then you can't know velocity (where it's heading to and/or future state and/or that which predicts future state) with certainty!

Sort of like you can know the instantaneous state of the Stock Market and its history... but no one can exactly predict its future (it has many, many degrees of freedom, all of which are subject to change in various unpredictable and bizarre ways!)

Which brings us back to #7:

>"7. If we fail to connect to a well-known remote host, then we must be offline."

We might be offline... but then again, we might not be! (Ping, ICMP, UDP, Telnet and Gopher anyone?)

But then again, we might be!

The Internet's online/offline status (is it really off if it is off? Is it really on if its on?) -- is much like some modern relationships, that is, "It's complicated!" :-)

The Internet is a Black Box!

It's Schrodinger's Internet!

You know, "if a TCP packet travelling at 99.44% of the speed of light on a westbound train track meets a UDP packet travelling at 99.43% of the speed of light on an eastbound train track, then when do they meet?"

You know, "solve for x..."

You know, "assume that the speed of light is constant and that quantum effects are not present!" :-)

By @dathinab - 7 months

While I often do like "falsehoods <....> believe about <...>" format it doesn't always fit in well (and if placed alone without explanation often can at most help you to know where you have knowledge gaps but not which).

A common problem are points which aren't really falsehoods, but where people frequently take false conclusions from it.

E.g. if you ask if TCP is reliable, especially in a non CS paper context, the answer is yes. That is iff you take a reasonable definition of reliable (which doesn't expect literally impossible things) and a reasonable interpretation of mostly. And just listing it as a falsehood fails to point out that there are two potential issues with your understanding while making creating the risk of someone with expertise in that sub-field of IT potentially thinking TCP is quite unreliable when it isn't. I mean the most common usage of the word reliable is a gradient with its meaning in a yes/no question being a short form of "reliable _enough_". Furthermore for most use-cases the "unreliable" aspect of TCP isn't even the main relevant misunderstanding people can have with "TCP is mostly reliable" (through for some use case it is)

The main troublesome misinterpretation is what mostly means. I.e. if you would give it a regious definition it would be "if sampling typical devices used in typical situations across some target audience then for most target audiences (weighted by audience relevance) most of the sampled devices will in a sufficient large long term moving average be reliable enough" or something like that.

What that mainly means:

- even if it's mostly reliable there will be devices for which it is reliable unreliable and anything in between

- similar even if it's mostly reliable for a device that isn't necessary all the time

- nor do we do statements about the patterns when the mostly doesn't apply, i.e. for a device TCP might be mostly reliable except every Sunday 3am for 30s, would still be mostly

- there are use-cases where unreliability is much more common

- there are audiences for which unreliability is much more common

etc.

Similar for point 5,6 about TCP packages, they are definitively a thing and there is no falsehood there. The falsehood is in believing you can reliable control them, that your OS or some middle ware isn't messing with them (e.g. splitting/combining/rewriting). So in some situations it's best to pretend there are non, but in some other situations you have to care and this might differ for different parts of the same protocol. So point 5 and 6 make sense, but don't point in a helpful direction.

to be clear doesn't mean lists are bad, or this list being particular bad, but I which they had more references/details even if short and compact and more clearly separate things too