September 20th, 2024

Discord Reduced WebSocket Traffic by 40%

Discord reduced websocket traffic by 40% by implementing zstandard compression, improving performance and bandwidth usage, especially on mobile, after adding streaming support and optimizing compression settings.

Read original articleLink Icon
Discord Reduced WebSocket Traffic by 40%

Discord has successfully reduced its websocket traffic by 40% through the implementation of zstandard compression, which replaced the previously used zlib. The transition aimed to enhance performance and reduce bandwidth usage, particularly on mobile platforms. Initially, zstandard did not perform as expected compared to zlib, primarily due to the lack of streaming compression capabilities. However, after forking the ezstd library to add streaming support, zstandard's performance improved significantly, achieving better compression ratios and faster compression times. The team also experimented with various compression settings and explored the use of dictionaries to optimize compression further. While the dictionary approach showed mixed results, the overall enhancements from zstandard streaming led to a notable reduction in payload sizes. Additionally, Discord investigated upgrading zstandard buffers during off-peak hours to utilize excess memory, although this strategy yielded less impact than anticipated due to memory fragmentation issues. Ultimately, the combination of these strategies has led to a more efficient and responsive service for users.

- Discord reduced websocket traffic by 40% using zstandard compression.

- Initial tests showed zstandard underperformed compared to zlib until streaming support was added.

- Compression settings were fine-tuned to optimize performance.

- Dictionary support was explored but ultimately deemed too complex for significant gains.

- Upgrading zstandard buffers during off-peak hours was less effective than expected due to memory fragmentation.

Link Icon 9 comments
By @transcriptase - 4 months
When can we expect discord not to take 20-30 seconds to launch on a $5000 PC? What exactly is it doing, recompiling the client from source using a single core each time it opens?
By @paxys - 4 months
Reading through the post they seem to have been hyper focused on compression ratios and reducing the payload size/network bandwidth as much as possible, but I don't see a single mention of CPU time or evidence of any actual measureable improvement for the end user. I have been involved with a few such efforts at my own company, and the conclusion always was that the added compression/decompression overhead on both sides resulted in worse performance. Especially considering we are talking about packets at the scale of bytes or a few kilobytes at most.
By @bri3d - 4 months
Interesting way to approach this (dictionary based compression over JSON and Erlang ETF) vs. moving to a schema-based system like Cap'n Proto or Protobufs where the repeated keys and enumeration values would be encoded in the schema explicitly.

Also would be interested in benchmarks between Zstandard vs. LZ4 for this use case - for a very different use case (streaming overlay/HUD data for drones), I ended up using LZ4 with dictionaries produced by the Zstd dictionary tool. LZ4 produced similar compression at substantially higher speed, at least on the old ARM-with-NEON processor I was targeting.

I guess it's not totally wild but it's a bit surprising that common bootstrapping responses (READY) were 2+MB, as well.

By @fearthetelomere - 4 months
>Diving into the actual contents of one of those PASSIVE_UPDATE_V1 dispatches, we would send all of the channels, members, or members in voice, even if only a single element changed.

> the metrics that guided us during the [zstd experiment] revealed a surprising behavior

This feels so backwards. I'm glad that they addressed this low-hanging fruit, but I wonder why they didn't do this metrics analysis from the start, instead of during the zstd experiment.

I also wonder why they didn't just send deltas from the get-go. If PASSIVE_UPDATE_V1 was initially implemented "as a means to scale Discord servers to hundreds of thousands of users", why was this obvious optimization missed?

By @RainyDayTmrw - 4 months
Something important that didn't get mentioned, neither in the post nor in the comments, is whether this is safe in the face of compression oracle attacks[1] like BREACH[2]. Given how much effort it seems Discord put into the compression rollout, I would be inclined to believe that they surely must have considered this, and I wish that they had written something more specific.

[1]: https://en.wikipedia.org/wiki/Oracle_attack [2]: https://en.wikipedia.org/wiki/BREACH

By @acer4666 - 4 months
Anytime I have a discord tab open it noticeably grinds my computer to a halt
By @jimmyl02 - 4 months
One thing I appreciate very much about this article is that they describe things they tried and didn't work as well. It's becoming increasingly rare (and understandably why) for articles to describe failed attempts but it's very interesting and helpful as someone unfamiliar with the space!
By @nickphx - 4 months
mIRC did it better.