July 10th, 2024

Show HN: Dut, a fast Linux disk usage calculator

Codeberg.org hosts "dut," a disk usage calculator for Linux. It accurately counts hard links, offers customizable output, and outperforms similar tools in speed and efficiency, making it valuable for Linux users.

Read original articleLink Icon
Show HN: Dut, a fast Linux disk usage calculator

The website Codeberg.org hosts a disk usage calculator for Linux called "dut." This tool accurately counts hard links and provides ASCII output compatible with Linux terminals. Users can customize the output format by adjusting command-line arguments. The calculator displays the size of entries on the disk, accounting for shared space due to hard links. It offers options to limit rows and depth shown. The tool is a single C source file and can be compiled with a C11 compiler. Benchmark tests show that "dut" performs well, especially after Linux disk caches are populated. It outperforms other programs like du from coreutils, dua, pdu, dust, and gdu in various scenarios. The benchmarks demonstrate the tool's speed and efficiency in calculating disk usage, making it a valuable resource for Linux users.

Related

Htop explained – everything you can see in htop on Linux (2019)

Htop explained – everything you can see in htop on Linux (2019)

This article explains htop, a Linux system monitoring tool. It covers uptime, load average, processes, memory usage, and more. It details htop's display, load averages, process IDs, procfs, and process tree structure. Practical examples are provided for system analysis.

Diff-pdf: tool to visually compare two PDFs

Diff-pdf: tool to visually compare two PDFs

The GitHub repository offers "diff-pdf," a tool for visually comparing PDF files. Users can highlight differences in an enhanced PDF or use a GUI. Precompiled versions are available for various systems, with installation instructions.

Background of Linux's "file-max" and "nr_open" limits on file descriptors (2021)

Background of Linux's "file-max" and "nr_open" limits on file descriptors (2021)

The Unix background of Linux's 'file-max' and 'nr_open' kernel limits on file descriptors dates back to early Unix implementations like V7. These limits, set during kernel compilation, evolved to control resource allocation efficiently.

My List of CLI Gems

My List of CLI Gems

The article discusses various CLI gems for package management, categorized into sections like Utilities, Git tools, and more. Highlighted gems include fzf, bat, lazygit, tmux, and dua-cli for different functionalities.

Show HN: Xcapture-BPF – like Linux top, but with Xray vision

Show HN: Xcapture-BPF – like Linux top, but with Xray vision

0x.tools simplifies Linux application performance analysis without requiring upgrades or heavy frameworks. It offers thread monitoring, CPU usage tracking, system call analysis, and kernel wait location identification. The xcapture-bpf tool enhances performance data visualization through eBPF. Installation guides are available for RHEL 8.1 and Ubuntu 24.04.

Link Icon 42 comments
By @montroser - 5 months
Nice work. Some times I wonder if there's any way to trade away accuracy for speed? Like, often I don't care _exactly_ how many bytes is the biggest user of space, but I just want to see some orders of magnitude.

Maybe there could be an iterative breadth-first approach, where first you quickly identify and discard the small unimportant items, passing over anything that can't be counted quickly. Then with what's left you identify the smallest of those and discard, and then with what's left the smallest of those, and repeat and repeat. Each pass through, you get a higher resolution picture of which directories and files are using the most space, and you just wait until you have the level of detail you need, but you get to see the tally as it happens across the board. Does this exist?

By @robocat - 5 months
> but I don't like how unintuitive the readout is

The best disk usage UI I ever saw was this one: https://www.trishtech.com/2013/10/scanner-display-hard-disk-... The inner circle is the top level directories, and each ring outwards is one level deeper in the directory heirarchy. You would mouse over large subdirectories to see what they were, or double click to drilldown into a subdirectory. Download it and try it - it is quite spectacularly useful on Windows (although I'm not sure how well it handles Terabyte size drives - I haven't used Windows for a long time).

Hard to do a circular graph in a terminal...

It is very similar to a flame graph? Perhaps look at how flame graphs are drawn by other terminal performance tools.

By @Neil44 - 5 months
On Windows I always used to use Windirstat but it was slow, then I found Wiztree which is many orders of magnitude faster. I understand it works by directly reading the NTFS tables rather than spidering through the directories laboriously. I wonder if this approach would work for ext4 or whatever.
By @nh2 - 5 months
> I don't know why one ordering is better than the other, but the difference is pretty drastic.

I have the suspicion that some file systems store stat info next to the getdents entries.

Thus cache locality would kick in if you stat a file after receiving it via getdents (and counterintuitively, smaller getdents buffers make it faster then). Also in such cases it would be important to not sort combined getdents outputs before starting (which would destroy the locality again).

I found such a situation with CephFS but don't know what the layout is for common local file systems.

By @jeffbee - 5 months
It's also interesting that the perf report for running dut on my homedir shows that it spends virtually all of the time looking for, not finding, and inserting entries in dentry cache slabs, where the entries are never found again, only inserted :-/ Great cache management by the kernel there.

ETA: Apparently the value in /proc/sys/vm/vfs_cache_pressure makes a huge difference. With the default of 100, my dentry and inode caches never grow large enough to contain the ~15M entries in my homedir. Dentry slabs get reclaimed to stay < 1% of system RAM, while the xfs_inode slab cache grows to the correct size. The threads in dut are pointless in this case because the access to the xfs inodes serializes.

If I set this kernel param to 15, then the caches grow to accommodate the tens of millions of inodes in my homedir. Ultimately the slab caches occupy 20GB of RAM! When the caches are working the threading in dut is moderately effective, job finishes in 5s with 200% CPU time.

By @INTPenis - 5 months
Reminds me of someone's script I have been using for over a decade.

    #/bin/sh
    du -k --max-depth=1 "$@" | sort -nr | awk '
         BEGIN {
            split("KB,MB,GB,TB", Units, ",");
         }
         {
            u = 1;
            while ($1 >= 1024) {
               $1 = $1 / 1024;
               u += 1
            }
            $1 = sprintf("%.1f %s", $1, Units[u]);
            print $0;
         }
        '
By @IAmLiterallyAB - 5 months
I'm surprised statx was that much faster than fstatat. fstatat looks like a very thin wrapper around statx, it just calls vfs_statx and copies out the result to user space.
By @mg - 5 months
I have this in my bashrc:

    alias duwim='du --apparent-size -c -s -B1048576 * | sort -g'
It produces a similar output, showing a list of directories and their sizes under the current dir.

The name "duwim" stands for "du what I mean". It came naturally after I dabbled for quite a while to figure out how to make du do what I mean.

By @laixintao - 5 months
> Anyone have ideas for a better format?

Hi, how about flamegraph? I always want to display the file hierarchy in flamegraph like format.

- previous discussion: https://x.com/laixintao/status/1744012609983295816

- my work display flamegraph in terminal: https://github.com/laixintao/flameshow

By @kccqzy - 5 months
I'm away from my Linux machine now but I'm curious whether/how you handle reflinks. On a supported file system such as Btrfs which I use, how does `cp --reflink` gets counted? Similar to hard links? I'm curious because I use this feature extensively.
By @inbetween - 5 months
I often want to know who there is a sudden growth disk usage over the last month/week/etc, what suddenly take space. In those cases I find myself wishing that du and friends would cache their last few runs and would offer a diff against them, this easily listing the new disk eating files or directories. Could dut evolve to do something like that?
By @timrichard - 5 months
Looks nice, although a feature I like in ncdu is the 'd' key to delete the currently highlighted file or directory.
By @teamspirit - 5 months
Nice job. I've been using dua[0] and have found it to be quite fast on my MacBook Pro. I'm interested to see how this compares.

[0] https://github.com/Byron/dua-cli

By @shellfishgene - 5 months
What I need is a du that caches the results somewhere and then does not rescan the 90% of dirs that have not changed when I run it again a month later...
By @imiric - 5 months
I've been using my own function with `du` for ages now, similar to others here, but I appreciate new tools in this space.

I gave `dut` a try, but I'm confused by its output. For example:

  3.2G    0B |- .pyenv
  3.4G    0B | /- toolchains
  3.4G    0B |- .rustup
  4.0G    0B | |- <censored>
  4.4G    0B | /- <censored>
  9.2G    0B |- Work
  3.7G    0B |   /- flash
  3.8G    0B | /- <censored>
   16G  4.0K |- Downloads
  5.1G    0B | |- <censored>
  5.2G    0B | /- <censored>
   16G    0B |- Projects
  3.2G   42M | /- <censored>
   17G  183M |- src
   17G    0B | /- <censored>
   17G    0B |- Videos
  3.7G    0B | /- Videos
   28G    0B |- Music
  6.9G    0B | |- tmp
  3.4G    0B | | /- tmp
  8.8G    0B | |- go
  3.6G    0B | |   /- .versions
  3.9G    0B | | |- go
  8.5G    0B | | |     /- dir
  8.5G    0B | | |   /- vfs
  8.5G    0B | | | /- storage
  8.5G    0B | | /- containers
   15G  140M | /- share
   34G  183M /- .local
  161G    0B .
- I expected the output to be sorted by the first column, yet some items are clearly out of order. I don't use hard links much, so I wouldn't expect this to be because of shared data.

- The tree rendering is very confusing. Some directories are several levels deep, but in this output they're all jumbled, so it's not clear where they exist on disk. Showing the full path with the `-p` option, and removing indentation with `-i 0` somewhat helps, but I would almost remove tree rendering entirely.

By @bitwize - 5 months
Neat, a new C program! I get a little frisson of good vibes whenever someone announces a new project in C, as opposed to Rust or Python or Go. Even though C is pretty much a lost cause at this point. It looks like it has some real sophisticated performance optimizations going on too.
By @anon-3988 - 5 months
I have been using diskonaut, its fast enough given that it also produces a nice visual output.
By @jonhohle - 5 months
Did you consider the fts[0] family of functions for traversal? I use that along with a work queue for filtered entries to get pretty good performance with dedup[1]. For my use case I could avoid any separate stat call altogether, the FTSENT already provided everything I needed.

0 - https://linux.die.net/man/3/fts_read

1 - https://github.com/ttkb-oss/dedup/blob/6a906db5a940df71deb4f...

By @sandreas - 5 months
Nice work! There is also gdu[1], where the UI is heavily inspired by ncdu and somehow feels way faster...

1: https://github.com/dundee/gdu

By @tambourine_man - 5 months
> https://dev.yorhel.nl/doc/ncdu2

I wasn't aware that there was a rewrite of ncdu in Zig. That link is a nice read.

By @hsbauauvhabzb - 5 months
This looks handy. Do you have any tips for stuff like queued ‘mv’ or similar? If I’m moving data around on 3-4 drives, it’s common where I’ll stack commands where the 3rd command may free up space for the 4th to run successfully - I use && a to ensure a halt on failure, but I need to mentally calculate the space free when I’m writing the commands as the free space after the third mv will be different to the output of ‘df’ before any of the commands have run.
By @cycomanic - 5 months
This looks awesome!

One comment, I find the benchmark results really cumbersome to read. Why don't you make a graph (e.g. a barplot) that would make results obvious at a quick glance. I'm a strong believer in presenting numerical data graphically whenever possible, it avoids many mistakes and misunderstandings.

By @bArray - 5 months
I think that 'ls' should also be evaluating the size of the files contained within. The size and number of contained files/folders really does reveal a lot about the contents of a directory without peeking inside. The speed of this is what would be most concerning though.
By @chomp - 5 months
GPLv3, you love to see it. Great work.
By @jftuga - 5 months
You should include the "How to build" instructions near the beginning of the main.c file.
By @mixmastamyk - 5 months
Not as featureful, but what I've been using. If you can't install this tool for some reason, it's still useful. I call it usage:

    #!/bin/bash

    du -hs * .??* 2> /dev/null | sort -h | tail -22
By @frumiousirc - 5 months
dut looks very nice.

One small surprise I found came when I have a symlink to a directory and refer to that with a trailing "/". dut doesn't follow the link in order to scan the real directory. Ie I have this symlink:

    ln -s /big/disk/dev ~/dev
then

    ./dut ~/dev/
returns zero size while

    du -sh ~/dev/
returns the full size.

I'm not sure how widespread this convention is to resolve symlinks to their target directories if named with a trailing "/" but it's one my fingers have memorized.

In any case, this is another tool for my toolbox. Thank you for sharing it.

By @trustno2 - 5 months
Does it depend on linux functionality or can I use it on macos?

Well I can just try :)

By @tonymet - 5 months
great app. very fast at scanning nested dirs. I often need recursive disk usage when I suddenly run out of space and scramble to clean up while everything is crashing.
By @jbaber - 5 months
I always want treemaps.

console: rust: cargo install diskonaut python: pip install ohmu GUI: gdmap windows: windirstat mac: grand perspective (I seem to r call)

By @jmakov - 5 months
Would be great to have a TUI interface for browsing like ncdu.
By @tiku - 5 months
Ncdu is easy to remember and use, clicking through etc. would be cool to find a faster replacement, same usage instead of a new tool with parameters to remember..
By @pmdfgy - 5 months
Nice work. I really miss the simplicity of C. One file. One Makefile and that's it. Has anyone tested with a node_modules folder ?
By @tamimio - 5 months
Someone, please create a Gdut, a fork that will produce graphs for a quick and easy way to read, it’s almost impossible to read in small vertical screens.
By @notarealllama - 5 months
If this accurately shows hidden stuff, such as docker build cache and old kernels, then it will become my go-to!
By @classified - 5 months
I get boatloads of "undefined reference" errors. Where's the list of dependencies?
By @jepler - 5 months
did you consider using io_uring? if not, was there a reason other than portability?
By @rafaelgoncalves - 5 months
neat tool, congrats on the release and thank you for this and the analysis/comparison.
By @kseistrup - 5 months
By @oigursh - 5 months
ncdu has been my go to for years. Pleased to have a modern alternative.
By @dima55 - 5 months
Ideas for a better format: do what xdiskusage does.
By @miew - 5 months
Why C and not Rust or even Zig?