July 23rd, 2024

The Linux audio stack demystified (and more)

Digital audio processing in Linux is complex, involving ALSA, JACK, PulseAudio, and PipeWire. Understanding requires knowledge of sound basics, human perception, and digital workings, including sampling and quantization.

Read original articleLink Icon
The Linux audio stack demystified (and more)

Digital audio processing in Linux involves a complex audio stack comprising ALSA, JACK, PulseAudio, and PipeWire. Understanding this system requires knowledge of sound basics, human sound perception, and digital audio workings. Sound is a vibration propagating as an acoustic wave through a medium like air, with key properties being period and amplitude. Human hearing ranges from 20 Hz to 20 kHz, with sensitivity between 2,000 Hz and 5,000 Hz. Digital audio involves sampling analog sound waves into digital data, with the Nyquist-Shannon theorem dictating the minimum sampling rate for accurate reconstruction. Quantization limits precision in digital systems, with bit depth determining the number of discrete levels available to represent sound amplitude. Storing digital audio involves representing samples in a structured format like CSV. This overview aims to demystify the Linux audio stack, shedding light on its components and interactions for a comprehensive understanding.

Link Icon 18 comments
By @kmarc - 7 months
I hear many complaining (even here) about "the mess" of linux audio.

First, in the article, [1] shows in one single diagram where the complexity is coming from; the audio system has to handle a good deal of different hardware on many different systems and also provide extra functionality for multiplexing, network features, wireless headsets and their codecs, etc. All this: open source.

Second: linux is the only platform where everything works right now flawlessly for me: My Bose joins without any problems, switches to headset mode during zoom calls, switches back to high definition audio otherwise. I can select a different sink, even networked, whenever I want the output to appear on a different networked device. MacOS sometimes needs a reboot so that the bluetooth subsystem works, what the hell.

And all this worked with PulseAudio, and now works with Pipewire, which is an even higher quality iteration of PA.

I don't complain. I wish MacOS/Windows had such a versatile, configurable, but sanely-working-out-of-the-box audio system as an off-the-shelf Fedora, or even freaking Arch linux has.

HTH

[1]: https://blog.rtrace.io/images/linux-audio-stack-demystified/...

By @probablybetter - 7 months
The "mess" of Linux audio is due to ONE reason: single-client ALSA driver model.

every other layer is a coping mechanism and the plurality and divergence of the FOSS community responds in various ways: - Jack - PulseAudio - PipeWire

I am unclear why Jaroslav Kyocera chose to make ALSA single-client, but Apples CoreAudio multi-client driver model is the right way to do digital audio on general-purpose computing devices running multi-tasking OS'es on application processors, in my opinion.

Current issues this article does not address that actually constitute large parts of the "mess" of Linux Audio:

- channel mapping that is not transparent nor clearly assigned anywhere in userspace. (aka, why does my computer insist that my multi-input pro-audio interface is a surround-sound interface? I don't WANT high-pass-filters on the primary L/R pair of channels. I am not USING a subwoofer. WTF)

- the lack of a STANDARD for channel-mapping, vs the Alsa config standards, /etc/asound.conf etc.

- the lack of friendly nomenclature on hardware inputs/outputs for DAW software, whether on the ALSA layer, or some sound-server layer. (not to mention that ALSA calls an 8-channel audio-interface "4 stereo devices")

- probably more, but I can't remember. My current audio production systems have the DAW software directly opening an ALSA device. I cannot listen to audio elsewhere until I quit my DAW. This works and I can set my latency as low as the hardware will allow it.

this is the thing: more than about 10ms latency is unacceptable for audio recording in the multitrack fashion, as one does.

By @laserbeam - 7 months
Here's my background

1. I have to modify my audio settings every time I start a call in Teams on Linux because it keeps losing my audio device.

2. In my audio settings UI, half the time I switch my devices the speaker test doesn't work.

3. In my audio settings UI, whenever I switch my mic I hear myself. The mic feedback only disappears 30 seconds after I close the settings UI.

4. My work headsets have a robotic sound (likely caused by an incorrect bitrate or buffer size). I can only use work bluetooth headsets via their dedicated dongle.

This was my default experience on a popular debian based distro. And it mirrors the general experience I see online. Things are unstable and a mess.

I started reading this article and it's embelished with phrases like: "is a professional-grade audio server", "widely used in professional audio production environments", and general language that sounds like a sales pitch. This does not fit with anything I'm familiar with.

I would have preferred a neutral and semi technical approach, with 10% of the buzzwords. As written, I trust nothing.

By @Cloudef - 7 months
Pipewire has pretty much unified the userland linux audio stack (+ supports video as well as bonus). Kernel side it has always been alsa. There's TinyAlsa so you don't have to use libasound to interface with the kernel alsa. (userland alsa is quite PITA)
By @gen2brain - 7 months
Whenever someone mentions Linux and Audio I always remember this image (made by Adobe I think) https://harmful.cat-v.org/software/operating-systems/linux/a.... It is missing Pipewire but it should be easy to add a dozen of new lines. This is the reason why I simply use plain ALSA without any sound daemons.
By @lynx23 - 7 months
Well, we had a lot of layers in Linux Audio the last 30 years. But when PulseAudio was forced into the world by a c-section, with everyone in the LA community already knowing its a still-birth, I kind of lost trust in coordinated project creation. PA makes me so unhappy that I totally uninstall it whereever I see it. Good for me that I am just a console user, because the damn beast is all over the GUI space.

Fact is, RT audio is hard, and the peoplebehind JACK have cared for the underlying problems for a long time already.

Maybe PipeWire, but to be honest, it reminds me too much of PA.

I guess I will stay with plain JACK and SuperCollider as my toolbelt, and not care about PA or PW. Like the grumpy old hacker I am.

By @shmerl - 7 months
> ALSA is the core layer of the Linux audio stack. It provides low-level audio hardware control, including drivers for sound cards and basic audio functionality.

> ...

> Ulike PuleAudio and JACK, PipeWire does not require ALSA on a system, in fact if ALSA is installed the output of ALSA is very likely pushed through PipeWire

I don't get this part. If ALSA represents the kernel level hardware drivers for audio, how does Pipewire bypass it? Does it implement an alternative set of kernel drivers? I assumed Pipewire still relies on ALSA base.

By @Galicarnax - 7 months
The text has a strong GPT-ish flavor.
By @sihox - 7 months
It's pretty nice article but for me - just for introductory purposes. It shows how sound and digital audio works and what basic libraries and tools we have in linux to deal with sound. But I'm still stupid when it comes to details and user interface tools. The article(s) I really love to see is, on one hand, more technically detail-specific and, on the other hand, broadly defining options I can have as an end user. I mean - from basic tools (CLI, GUI) that are available for simple purposes like volume control, stream selection, etc. to pro-audio, complex scenarios. For me it's too many tools and options I can use in linux for audio and this is the reason for being lost sometime. Of course for daily use I have pipewire with pulse, alsa and jack "plugins" which gives me seamless cooperation with lots of apps and controls but maybe I can get rid of some module or app...
By @codedokode - 7 months
I don't really understand how was JACK supposed to be used. On Windows or Mac you typically run a DAW and load plugins into it. But on Linux the user is supposed to run every plugin as a separate application and connect them using JACK? Doesn't this mean there would be lot of context switches? Also, in a DAW you can save your configuration, but how do you do this with JACK and a bunch of independent applications?

Also, given that Pulseadio and Pipewire both support ALSA clients, does it mean that the preferred API for applications should be ALSA? This way they can play sound on any system, even where there is no audio daemon.

By @cod1r - 7 months
An issue I've experienced very often is that sometimes when my laptops goes to sleep and I wake it up, the speakers occasionally aren't switched to unless I restart pipewire. Same thing for headphones, sometimes when I plug them in, they aren't switched to unless I replug it in a couple of times. Might be hardware related but situations like this make me feel like I should just use linux for servers instead of for a personal computer.
By @probablybetter - 7 months
Am I the only one that sees unfriendly input/output channel names with Pipewire in client software?

(Bitwig, Ardour, Reaper, more? I would like to see "Input 1" or "Channel 1" and not some strange ciphers when trying to assign things in a little dropdown selector in a DAW)

By @thriftwy - 7 months
I remember there was oss and alsa. Then on top of that you had esd or artsd, with incompatible APIs and userland delays. Then at some point they were replaced with pulseaudio. I've just noticed now that pulseaudio is also replaced with pipewire in latest Ubuntu.

That does not instill confidence in that they have any idea what they are doing.

I remember that as far as on 2012 some Linux game ports from Steam (before Proton was a thing) failed to play sound.

Other than that, it took them a decade to figure out that sound should switch to HDMI when it is plugged in. It may still require arcane config changes and may break down.

I've opened the post to read about pipewire, and it seems that clicking an anchor does nothing. So it's not only sound they can't get right.

By @moogly - 7 months
I want to move my DAW to Linux, but giving up TotalMix is quite a blow...

Not sure how well the NI Native Access crap is working on WINE either. Does anyone know?

By @g15jv2dp - 7 months
I now have flashbacks to when I tried to get sound working on my Gentoo install 15 years ago. It seems that now everything has changed (wtf is pipewire) and still requires arcane knowledge to get everything working smoothly...
By @hcfman - 7 months
Beautiful work
By @poakjsn - 7 months
Linux distros would be best off lifting as much as possible from the Android Open Source Project, i.e. a professional and streamlined Linux-based system that actually works and isn't just a mishmash of incompatible, poorly designed hobbyist trash.

The year of the Linux desktop never arrived but most of the world has Linux sitting in our palms. Let's build on that instead of the dead ends of Debian, Slackware, etc.