June 23rd, 2024

I found an 8 years old bug in Xorg

An 8-year-old Xorg bug related to epoll misuse was found by a picom developer. The bug caused windows to disappear during server lock, traced to CloseDownClient events. Despite limited impact, the developer seeks alternative window tree updates, emphasizing testing and debugging tools.

Read original articleLink Icon
I found an 8 years old bug in Xorg

An 8-year-old Xorg bug was recently discovered by a developer working on the X11 compositor, picom. The bug involves the GrabServer command, which is supposed to lock the X server to fetch the window tree without interruptions. However, it was found that windows were disappearing while the server was locked, indicating a potential Xorg bug related to epoll misuse. By using eBPF and uprobe tools, the developer traced the issue to CloseDownClient events triggered by connection errors, leading to the window closures. Despite the bug being present for 8 years, its impact was limited due to the unique role of X11 compositors in the system. The developer is exploring ways to update the window tree without relying on the server lock. This discovery highlights the importance of thorough testing and the use of advanced debugging tools in software development.

Related

Schema changes and the Postgres lock queue

Schema changes and the Postgres lock queue

Schema changes in Postgres can cause downtime due to locking issues. Tools like pgroll help manage migrations by handling lock acquisition failures, preventing application unavailability. Setting lock_timeout on DDL statements is crucial for smooth schema changes.

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

Andrew S. Tanenbaum Receives ACM Software System Award

Andrew S. Tanenbaum Receives ACM Software System Award

Andrew S. Tanenbaum, known for MINIX, receives ACM Software System Award for shaping OS education and influencing Linux's design. His microkernel work continues to impact OS development globally.

FreeBSD Bhyve Companion Tools

FreeBSD Bhyve Companion Tools

The author details transitioning from VirtualBox to FreeBSD Bhyve, praising Bhyve's benefits in a FreeBSD setting. Tools like VNC connection and pause/resume scripts optimize Bhyve operations, simplifying VM management.

Getting 100% code coverage doesn't eliminate bugs

Getting 100% code coverage doesn't eliminate bugs

Achieving 100% code coverage doesn't ensure bug-free software. A blog post illustrates this with a critical bug missed despite full coverage, leading to a rocket explosion. It suggests alternative approaches and a 20% coverage minimum.

Link Icon 14 comments
By @thenbe - 5 months
Picom has an awesome feature [0] that, for the sake of all our eyes, should come by default on every device with a screen. It can continuously adjust the brightness of individual windows by averaging all the pixels in that window. It's great for defending against "flashbangs" (when a new tab burns your eyes with a blank white screen).

0: https://github.com/yshui/picom/blob/ae73f45ad9e313091cdf720d...

By @asveikau - 5 months
> Basically when a window is created, we receive an event. After getting that event, we lock the X server, then ask it about the new window. And sometimes, the window is just not there

Relying on this sounds like a race condition even if the lock is working. In the time between you process the event and getting the lock, the window could have been destroyed.

By @Jasper_ - 5 months
The immediate destruction of client resources even under server grab is not new functionality to the epoll port; it behaved the same under the old select-based loop, too. It might be a bug, but that's how it's always behaved.

It shouldn't be too hard to fix if you want to try, either; check client->ignoreCount > 0 when handling X_NOTIFY_ERROR before calling CloseDownClient.

By @wavemode - 5 months
By Xorg standards, "8 years old" is basically brand new.
By @jraph - 5 months
> I could attach a debugger to the X server, however, debugging the X server pauses it, which would be a problem if I was debugging from inside that X session. Beside that, window destruction happens quite often, which can be prohibitive for manual debugging. It's still possible with a remote ssh connection, and gdb scripting, but it's inconvenient.

I will definitely look into eBPF and uprobe mentioned in the article, I don't know about them. But wouldn't something like XNest (or XWayland?) help with debugging X? You'd debug an X server running in a window comfortably from your real environment and display server without it being disrupted by the debugging session.

By @TillE - 5 months
> It's still possible with a remote ssh connection, and gdb scripting, but it's inconvenient.

In this situation, don't gdb breakpoint command lists give you exactly what you need (printing stack traces at certain points) with minimal effort? Nothing wrong with using different tools, but it's not clear if that option was considered.

By @gpvos - 5 months
So, how has this been resolved? Has the bug been fixed now?
By @hulitu - 5 months
> To put it simply, picom needs to fetch the window tree from X. But there is no way to get the whole tree in one go

There are a lot of WMs who offer a window list (twm, fvwm). Maybe the guy shall take a look how they do it.

And requests to the X server are not syncroneus.

By @jeffbee - 5 months
There are undoubtedly flaws in Xorg dating to this and each of the previous four decades.
By @proneb1rd - 5 months
Planning on fixing this bug? :-)
By @nairboon - 5 months
Interesting article, although it could use some more hyperlinks, e.g. to the bug.
By @przemub - 5 months
In Xorg timescales that's probably one of the youngest bugs :D
By @firesteelrain - 5 months
I don’t understand why mention he was using .NET with Wine if not going to tell us why. Just silly

EDIT: Downvote but don't respond. Even sillier!