July 22nd, 2024

The Linux audio stack demystified

The article explores the Linux audio stack, detailing sound properties, digital audio sampling, and components like ALSA and PulseAudio, providing a comprehensive guide to audio processing in Linux systems.

Read original article

CuriosityFrustrationAppreciation

The article provides an in-depth exploration of the Linux audio stack, aiming to clarify the complexities of digital audio processing. It begins with a fundamental understanding of sound, describing it as vibrations that travel through mediums like air, and explains key properties such as period, amplitude, and frequency. The human auditory system's processing of sound waves is detailed, highlighting how sound is captured, amplified, and converted into electrical signals that the brain interprets.

The discussion then shifts to digital audio, emphasizing the importance of sampling, which converts analog sound into digital data. The Nyquist-Shannon Sampling Theorem is introduced, explaining the necessary sampling rates for accurate signal reconstruction. Quantization is also covered, illustrating how digital audio represents sound with a limited number of discrete values based on bit depth, with 16-bit and 24-bit audio being common standards.

The article further delves into the components of the Linux audio stack, including ALSA, JACK, PulseAudio, and PipeWire, each serving distinct roles in audio management. It explains the functions of sound servers, such as mixing multiple input streams and providing volume control. The piece concludes by encouraging readers to consider which sound server best suits their needs, emphasizing the interconnectedness of sound theory and practical audio processing in the Linux environment. Overall, the article serves as a comprehensive guide for those interested in understanding the intricacies of audio on Linux systems.

Real-time audio programming 101: time waits for nothing

Real-time audio programming demands adherence to principles like avoiding blocking operations and prioritizing worst-case execution time to prevent glitches. Buffer sizes of 1-5ms are typical for low-latency software.

The Linux audio stack demystified (and more)

Digital audio processing in Linux is complex, involving ALSA, JACK, PulseAudio, and PipeWire. Understanding requires knowledge of sound basics, human perception, and digital workings, including sampling and quantization.

AI: What people are saying

The comments reflect a mix of appreciation and critique regarding the Linux audio stack article.

Readers found the article informative, especially for understanding the Linux audio stack and its components.
There are requests for more detailed information on specific topics, such as the rating chart and Bluetooth audio issues.
Some users express frustration with the complexity of audio processing on Linux compared to other systems.
Concerns about real-time audio processing and the challenges of achieving low latency are highlighted.
Several comments mention the need for clarity and simplicity in the audio stack, with nostalgia for older systems like OSS.

13 comments

By @ruffyx64 - 9 months

Wrote this blog article as I needed to get a better understanding of the audio stack on Linux (esp. PipeWire, PulseAudio, ALSA, etc. ...). The article turned out to be a lenghty in-depth explanation of how audio works, how digital audio works, and what sound servers on linux actually do. Tried to write it in a way so it is accessible and understandable for beginners but also enlightening for experienced users. Hope it's helpful to HN

By @amy-petrik-214 - 9 months

I can explain it much more simple

"At first Linus created /dev/dsp, and the user did smile upon him, and the user did see that it was good, and the user did see that it was simple, and people did use their sound, and people did pipe in and out sound as they did please, and Ken Thompson Shined upon them for following the way"

"Then the fiends got in on it and ruined it all, with needless complexities and configurations and situationships, with servers and daemons, and server and daemon wrappers to wrap the servers and daemons, and wrappers for those server wrappers, and then came security permissions for the server wrapper wrapper wrappers, why doesn't my sound work anymore, and then the server wrapper server wrapper wrapper server did need to be managed for massive added complexity, so initd was replaced by systemd, which solves the server wrapper wrapper server server wrapper through a highly complicated system of servers and services and wrappers"

RIP /dev/dsp you will be missed

- Kernighan 3:16

By @anvuong - 9 months

Thanks for the nice writing. But do you have any insight on why is bluetooth audio so clunky on Linux? I'm using a pair of Sony XM4 and I have never had any problems on my 4 Windows machines. But on Ubuntu (both 22.04 and 24.04), I have had to jump through many hoops, from editing a bunch of config files, changing kernel flags, disable and enable a bunch of things I don't understand (mostly from reading Arch Wiki), just to get it working some of the times. Some days it will just outright refuse to connect, sometime it connects but not playing anything (switching audio device to it generates some undecipherable error logs), and (probably worst) sometime it connects very quickly but stay locked in low fidelity mode instead of a2dp sink. I'm so fed up that I just switched to wired headphones every time I use my Ubuntu.

By @epx - 9 months

I miss the simplicity of OSS :\

By @Zamiel_Snawley - 9 months

An informative article for the Linux parts, I skipped the basics/intro.

I’d like to see some more detail on the rating chart, particularly on the axes where pipewire doesn’t surpass JACK/pulseaudio.

As an embedded software engineer who deals with processing at hundreds of kilohertz, it is funny hearing anything running Linux called “real time”.

If it’s not carefully coded on bare metal for well understood hardware, it’s not real time, it’s just low latency. No true Scotsman though(looking over my shoulder for the FPGA programmers).

By @mannyv - 9 months

So far the audio section is a great intro to audio and digitization, and applies to any a-to-d process at some level. Looking forward tomplowing through the rest.

The problem with audio is it's realtime (isochronous), which means good audio processing requires a guarantee of sorts. To get that guarantee requires a path through the system that's clear, which can be difficult to construct.

By @ladzoppelin - 9 months

"Professional audio will typicall utilize 24-bit. Everything higher than that is usually bogus. Bogus where only audiophiles will hear a difference." Does he mean internal DAW bit rates like 64/32bit float are bogus, I am probably reading it wrong ?

By @Voklen - 9 months

Very nice article, I love posts that go right from the basics and build up to answer the question. And I certainly have a better understanding of DACs as a bonus!

By @g15jv2dp - 9 months

Dupe from three days ago by the same author https://news.ycombinator.com/item?id=41042753

By @Venn1 - 9 months

No mention of AoIP. I make heavy use of Netjack2 in my production / streaming studio. Great way to move 25/30 channels of audio between 5 PCs in real-time.

Beats the pants off DANTE.

By @lofaszvanitt - 9 months

Well, the most confusing part of linux is definitely the audio stack. Thanks for the writeup.

The Linux audio stack demystified

Related

Real-time audio programming 101: time waits for nothing

The Linux audio stack demystified (and more)

Related

Real-time audio programming 101: time waits for nothing

The Linux audio stack demystified (and more)