October 3rd, 2024

The Heart of Unix (2018)

Unix remains a powerful programming environment, allowing users to extend functionality through shell scripts. However, it has outdated components and modern Linux distributions complicate user interactions, suggesting a need for evolution.

Read original articleLink Icon
AgreementEnthusiasmFrustration
The Heart of Unix (2018)

Unix, despite its age and shortcomings, remains a powerful and flexible environment for programming. Eric Normand emphasizes that Unix's programmability stems from its ability to allow users to create and extend functionality through simple shell scripts and text streams. The Unix philosophy advocates for writing small, efficient programs that can work together, leveraging a universal interface based on text streams. This design enables interoperability among various programming languages, allowing users to choose the best tool for their tasks without being constrained by language-specific limitations. However, Normand also points out the outdated aspects of Unix, such as its file system and terminal interface, which have not evolved significantly despite advancements in technology. He critiques the layering of modern Linux distributions, which often complicate rather than simplify user interactions with the system. Normand suggests that while Unix has a solid foundation, there is potential for improvement, particularly in how programs communicate and handle data. He advocates for evolving the stdin/stdout model to accommodate more complex data interactions and integrating structured text and binary formats into the Unix ecosystem. Ultimately, he envisions a future where Unix can adapt to modern needs while maintaining its core principles of simplicity and efficiency.

- Unix's programmability allows users to create and extend functionality easily.

- The Unix philosophy promotes writing small, efficient programs that work together.

- Despite its strengths, Unix has outdated components that need modernization.

- Modern Linux distributions often complicate user interactions with unnecessary layers.

- There is potential for evolving Unix's data handling to include structured and binary formats.

AI: What people are saying
The comments reflect a diverse range of opinions on Unix and its programming environment.
  • Many users appreciate Unix's programmability and the power of shell scripting, emphasizing its flexibility and efficiency.
  • Criticism arises regarding the limitations of text-based data exchange in shells, which can complicate error handling and debugging.
  • Some commenters express a desire for improved interoperability between programming languages and Unix tools.
  • There is a call for modernization of Unix components to keep pace with contemporary computing needs.
  • Several users highlight the importance of maintaining Unix's core principles while evolving its functionality.
Link Icon 21 comments
By @chubot - 7 months
I generally agree with this article in that PROGRAMMABILITY is the core of Unix, and it is why I've been working on https://www.oilshell.org/ for many years

However I think the counterpoint is maybe a programming analog of Doctorow's "Civil War on General Purpose Computing"

I believe the idea there was that we would all have iPads and iPhones, with content delivered to us, but we would not have the power to create our own content, or do arbitrary things with computers

I think some of that has come to pass, at least for some fairly large portions of the population

(though people are infinitely creative -- I found this story of people writing novels their phone with Google Docs, and selling them via WhatsApp, interesting and cool - https://theweek.com/culture-life/books/the-rise-of-the-whats... )

---

The Unix/shell version of that is that valuable and non-trivial logic/knowledge will be hidden in cloud services, often behind a YAML interface.

And your job is now to LLM the YAML that approximates what you want to do

Not actually do any programming, which can lead to adjacent thoughts that the cloud/YAML owners didn't think of

In some cases there is no such YAML, or it's been trained out of the LLM, so you can't think that thought

---

There's an economic sense to this, in some ways, but personally I don't want to live in that world :)

By @coliveira - 7 months
The biggest disadvantage of the shell is that, by exchanging data using text, you lose opportunities to check for errors in the output. If you call a function in a programming language and an erroneous output happens, you get a crash or exception. In a shell, you'll get empty lines or, worse, incorrect lines, that will propagate to the rest of the script. This makes it impractical to write large scripts and debugging them gets more and more complicated. The shell works well for a few lines of script, any more than that and it becomes a frustrating experience.
By @nxobject - 7 months
I have a slight bone to pick with the author's statement that Unix is homoiconic – sure, I can tail and patch a file, but it doesn't mean that I can seamlessly and quickly manipulate, generate, and execute the canonical representation of executable code in the same way that I can do with s-expressions, quote/quasiquote and eval. I think the bar for meaningful homoiconicity is should at least be raised to include that.

If "I can read my source just into the canonical datatype" was the standard for an environment to be meaningfully homoiconic, you could easily argue that bare metal was homoiconic for the same reason. And in fact I'd argue that it would be easier to hand-assemble VAX instructions to write more opcodes in memory (because the VAX-11 had such an extensive and convenient instruction set, especially all of the three-operand bit swizzling and manipulation instructions) than to do C code generation with a base Unix environment.

By @anthk - 7 months
Today Unix philisophy it's better done at 9front than the Unix clones themselves.

>Functional + universal data structure + homoiconic = power

It everything used TSV or tabular data, yes. But is not the case. With lisp you can always be sure.

>I edit my entries in Emacs.

Emacs can do dired (ls+vidir), eshell, rsync maybe to s3 (emacs package+rclone), markdown to HTML (and more from ORG Mode) and tons more with Elisp. With ORG you can basically define your blog and with little of Elisp you could upload your blog upon finishing.

>21st Century Terminal

Eshell, or Emacs itself.

>. What if we take the idea of Unix programs as pure functions over streams of data a little further? What about higher-order functions? Or function transformations? Combinators?

Hello Elisp. On combinators, maybe that shell from Dave from CCA. MPSH? https://www.cca.org/mpsh/

By @mmcgaha - 7 months
I always tell people the best way to learn how to use linux is to read The Unix Programming Environment.
By @niobe - 7 months
Great article. I was only just thinking this week, "are there really still only 3 channels?".

But short of a massive overhaul and in spite of the shortcomings the current system still _works_ better than any other platform.

I would like to see unix stay relevant for the long-term however. It's possible these shortcomings lead one day to a the trade-off against newer systems not being worth making, or being just incompatible.

By @buescher - 7 months
With image-capable terminals and funky enhanced cli utilities we are sort of slouching towards something like a CLIM listener or a notebook interface at the shell. What would something in that vein that was really, really nice look like?
By @gavinhoward - 7 months
> I hope to see more "sugar" in languages to take advantage of calling out to other programs for help.

How about [1] and [2]?

My language has those because its first program was its own build script, which requires calling out to a C compiler. It had that before printing to stdout.

Turns out, that made it far more powerful than I imagined without a standard library. Calling out to separate programs is far better than a standard library.

[1]: https://git.yzena.com/Yzena/Yc/src/commit/95904ef79701024857...

[2]: https://git.yzena.com/Yzena/Yc/src/commit/95904ef79701024857...

By @emmelaich - 7 months
Nice article.

The criticism of the file system as overly simple or archaic is often been made, ever since the 70s. However the fact is that it IS use-able as a base for ACID capable software. Numerous reality based evidence attests to that.

I remember in Rochkind's book[0] there is a quote criticising Unix being inferior to IBM's MVS because it didn't have locking. As Rochkind retorts, MVS didn't either! Not as a kernel feature, but via user space software, which is eminently do-able in Unix too.

[0] https://www.oreilly.com/library/view/advanced-unix-programmi...

By @dvektor - 7 months
This was a fantastic post. It sums up many of the reasons why I love daily driving linux, and why the majority of my workflow happens in the terminal. Vim + Unix is the best IDE
By @golly_ned - 7 months
What does this article add to the countless others espousing the Unix model for exactly the same thing?
By @sim7c00 - 7 months
I'm not sure why but i kind of feel this is just the general purpose of an operating system, to provide an environment for things to 'be done in' using the system's resources. And you can't pre-build everything that needs to be done, so it's important to make it extensible and allow for interoperability between programs relatively easily.

Windows does this too... it's just less touted to be such an environment as it has a lot of applications built in binary form which do a lot of work for you. But essentially, most of it's functionality is exposed via scripting interfaces, a lot of programs can also be extended with simple scripts. This even without including powershell into the mix, which allows to really go the next mile. you can even createRemoteThread (maybe a bad idea, but an example of how extensible it is! ;D)... - it's not posix etc. but definitely programmable.

don't get me wrong, i do love that people on unix aim to make things pipeable. I'd hope someday they will make their outputs easier to parse though rather than to have to have cut,awk,sed in the mix every-time to reformat stuff into a structure more easily interpreted. A common difficult task is to parse output from 'ls'. The underlying data is quite organized at every level, but the tool outputs are a big struggle to parse if you don't want to rely on strict formatting of filenames.

this last bit ofcourse can be said about many things about computers and data storage/exchane - it's kind of always a mess. there's so many standard ways to output things, and non-standard, that it's just a zoo of stuff to parse...

I'd be delighted if someday there's an OS which requires things sent into another program to adhere to an open and well defined data formatting standard, and just one at that. I guess no one wants to reinvent the wheel though and make each piped data piece be strictly json or something like that. it would make life a lot easier and can even be serialized/deserialized fairly generically to optimise transfer where needed...

It's what i want to do for my own OS, sadly no one will ever use that... :D but i am free to dream as i type a million lines of defines and bit twiddles! :D

By @metadat - 7 months
> We see that languages like Perl and Python have huge numbers of libraries for doing all sorts of tasks. Those libraries are only accessible through the programming language they were developed for. This is a missed opportunity for the languages to interoperate synergistically with the rest of the Unix ecosystem.

What would this interoperability look like, in practical terms?

For example, how would you invoke a program in language A from language B, other than the typical existing `system.exec(...)'.

By @whartung - 7 months
I'm on board with this.

Unix is my favorite OS.

I like that it's fundamental unit of work is the process, and that, as users, we have ready access to those. Processes are cheap and easy.

I can stack them together with a | character. I can shove them in the background with a & (or ^Z and bg, or whatever). Cron is simple. at(1) and batch(1) are simple.

The early machines I worked on, processes were a preallocated thing on boot. They weren't some disposable piece of work. You could do a lot with it, but it's not the same.

Even when I was working on VMS, I "never" started new processes. Not like you do in Unix. Not ad hoc, "just for a second". No, I just worked directly with what I had. I could not compose new workflows readily out of processes.

Processes give a lot of isolation and safety. If a process goes mad, it's (usually) easily killed with little impact to the overall system. Thus its cheap and forgiving to mess up with processes.

inetd was a great idea. Tie stdin/stdout to a socket. Any one and their brother Frank could write a service managed by inetd -- in anything. CGI-BIN is the same way. The http server does the routing, the process manages the rest. Can you imagine shared hosting without processes? I shudder at the thought.

Binary processes are cheap too, with shared code segments making easy forks, fast startup, low system impact. The interpreters, of course, wrecked that whole thing. And, arguably, the systems were "fast enough" to make that impact low.

But inetd, running binary processes? That is not a slow server. It can be faster (pre-forking, threads, dedicated daemons), but that combo is not necessarily slow. I think the sqlite folks basically do this with Fossil on their server.

Note, I'm not harping on "one process, one thing", that's different. Turns out when processes are cheap and nimble, then that concept kind of glitters at the bottom of the pan. But that's policy, not capability.

But the Unix system is just crazy malleable and powerful. People talk about a post-holocaust system. How they want something like CP/M cuz its simple. But, really? What a horrific system! Yes, a "unix like system" is an order of magnitude more complex than something like CP/M. But its far more than an order of magnitude more capable. It's worth the expense.

Even something weak, like Coherent on a 286. Yea, it had its limitations, but the fundamentals were there. At the end of the world, just give me a small kernel, sh, vi, cc, and ld -- I can write the rest of the userland -- poorly :).

By @anthk - 7 months
On shells for Unix, this can be really useful to cut script regex matching in half:

https://www.cca.org/mpsh/docs-08.html

By @mustache_kimono - 7 months
> Compare that to Clojure, where you constantly define and redefine functions at the REPL.

It's an interactive shell FFS, does it get more REPL than that?!

`set -x` is what you want brother.

By @zzo38computer - 7 months
Being a programmable environment is one of the good benefits of UNIX, and piping programs together is also a good benefit of UNIX.

"Write programs that do one thing and do it well" and "Write programs to work together" are good ideas, too (unfortunately many programs don't).

I think that using a text stream for everything is not the best idea though. In many cases binary formats will do better. I think XML and JSON are not that good either.

I think "cache your compiler output to disk so you wouldn't have to do a costly compile step each time you ran a program" is a good idea, although this should not be required; REPL and other stuff they mention there is also very helpful.

They say the file system is also old. My idea is a transactional hypertext file system. It doesn't have metadata (or even file names), but a file can contain multiple numbered forks and you can store extra data in there.

(Transactional file system is something that I think is useful and that UNIX doesn't do.)

They are also right about the terminal is old, although some of the newer things that some people had tried to do have different sets of problems.

They also say another unfortunate thing is layering, and I agree that this layering is excessive.

Interoperating without needing FFI is also helpful (and see below what I mention about typed initial messages, too).

About the stuff listed in "Text streams, evolved", my idea of the operating system design, involves the "Common Data Format" (which is a binary format, somewhat like ASN.1 BER but different), and most data, including the command shell and most files, would use it; this also allows for common operations.

I agree with "a program which displays all of the thumbnails of the files listed on stdin would be much more useful to me than a mouse-oriented file browser", and I do not have a GUI file browser anyways. I do use command-line programs for most things, even though I have X Windows to run some GUI programs and to be able to have multiple xterms at once (I often have many xterms at once). However, it could be improved as I describe above, too.

They mention the shell. I agree that it could be greatly improved, and I think that it would go with the other improvements above. My operating system design effectively requires "programs as pure functions over streams of data" (although it is functions over "capabilities", and not necessarily "streams of data") due to the way that the capability-based security is working, and the way the linking and capability passing is working also allows working like higher-order functions and transformations and all of that stuff. Even, my idea also involves message passing (all I/O is done by passing messages between capabilities), too.

I had also considered programs that require types. One of the forks (like I mentioned above) of a executable file can specify the expected type of the initial message, and the command shell can use this to effectively make them like functions that have types.

Something they don't mention is security. That can also be improved; the capability-based security that I mention above, if you have proxy capabilities too, will improve it. There is also the possibility that users can use the command shell and write other programs to make up your own proxy capabilities, and this allows programs to be used to do things that they were not necessarily designed to do, in addition to improving security. Instead of merely a user account, it might e.g. allow to write to only one file, or allow connecting to only one remote computer (without the program knowing which one it is, and perhaps even with data compression that the application program is unaware of), etc.

I still think that, even if you have powerful computers, you should still program it efficiently anyways.

The new one won't be UNIX; it will be something else.

By @paulddraper - 7 months
> Unix is homoiconic

Wild, very cool

By @gregw2 - 7 months
The author of the article seems unaware of awk or jq or perl one-liners for handling JSON or other forms of data from UNIX command line.