September 5th, 2024

A tiny self-remaking C program

The article introduces a self-rebuilding C program using a minimal shell script, emphasizing the build process as computation, the importance of caching, and the need for improved security scrutiny in build systems.

Read original articleLink Icon
A tiny self-remaking C program

The article discusses a novel approach to creating a self-rebuilding C program using a minimal shell script. The author presents a one-file C program that can rebuild itself under specific environmental conditions, particularly with the GNU Coreutils 8.30 or FreeBSD. The script is described as a hack and not intended for serious use. The author reflects on the conceptual framework of build systems, suggesting that the build process should be viewed as a computation similar to execution. This perspective emphasizes the importance of caching intermediate results to enhance execution speed. The author raises questions about integrating testing into the build process, especially in light of recent security concerns, such as the xz backdoor incident. The discussion highlights the need for improved scrutiny in build systems to prevent vulnerabilities while acknowledging the complexity involved in ensuring security.

- The article presents a self-rebuilding C program using a minimal shell script.

- It emphasizes viewing the build process as a computation akin to execution.

- Caching intermediate results is suggested to improve execution speed.

- The author questions the integration of testing into build systems due to security concerns.

- The need for better scrutiny in build processes is highlighted to prevent vulnerabilities.

Link Icon 14 comments
By @d99kris - 4 months
I do something similar (but less portable and more verbose) in C++ sometimes when I want to prototype something. My boilerplate is something like this:

  #if 0
  TMP=$(mktemp -d);
  c++ -std=c++11 -o ${TMP}/a.out ${0} && ${TMP}/a.out ${@:1}; RV=${?};
  rm -rf ${TMP};
  exit ${RV};
  #endif
    
  #include <iostream>
    
  int main()
  {
    std::cout << "Hello, world!\n";
  }
(the trailing semi-colons in the script part is to make my editor indent the C++ code properly)
By @seanw265 - 4 months
Based on the title, I was expecting this to be about quines [1].

If you aren't familiar, quines are programs which produce their own source as their only output. They're quite interesting and worth a dive if you haven't explored them before.

My personal favorites are the radiation-hardened variety, which still produce the original pre-modified source even when any single character is removed before the program is run.

[1]: https://en.wikipedia.org/wiki/Quine_(computing)

By @codethief - 4 months
> the right conceptual basis for build systems: ”build is the first stage of execution”

I have long been thinking the same. And also: "Running tests is the first stage of deploying in production" etc.

In other words: There is often a whole dependency graph between various stages (install toolchain, install 3rd-party dependencies, do code generation, compile, link/bundle, test, run, …) and each of those stages should ideally be a pure function mapping inputs to outputs. In my experience, we often don't do a good job making this graph explicit, nor do we usually implement build steps as pure functions or think of build systems as another piece of software which needs to be subject to the same quality criteria that we apply to any other line of code we write. As a result, developer experience ends up suffering.

Large JavaScript projects are particularly bad in this regard. Dependencies, auto-generated code and build output live right alongside source code, essentially making them global state from the point of view of the build system. The package.json contains dozens of "run" commands which more often than not are an arcane mix of bash scripts invoking JS code. Even worse, those commands typically need to be run in juuust the right order because they all operate on the same global state. There is no notion of one command depending on another. No effort put into isolating build tasks from each other and making them pure. No caching of intermediate results. Pipelines take ages even though they wouldn't have to. Developers get confused because a coworker introduced a new npm command yesterday which now needs be to run before anything else. Ugghhh.

By @actionfromafar - 4 months
"TCC can also be used to make C scripts, i.e. pieces of C source that you run as a Perl or Python script. Compilation is so fast that your script will be as fast as if it was an executable."

https://bellard.org/tcc/tcc-doc.html

By @LorenDB - 4 months
In D, this is a fully-supported used case. The DMD compiler provides an executable called rdmd that can be used as the shebang executable at the top of any D file, and the shebang itself is also codified as valid D syntax.
By @rwmj - 4 months
This is completely off topic for the actual article, but I had a bit of fun a while back working out how to make self-executing C and OCaml scripts/programs.

It's an interesting exercise working out what is both a comment in the target language but is also an executable shell script. For C it's reasonably straightforward but for OCaml it's quite subtle:

https://libguestfs.org/nbdkit-cc-plugin.3.html#C-plugin-as-a... https://libguestfs.org/nbdkit-cc-plugin.3.html#Using-this-pl...

By @lboc - 4 months
Seems to be replicating mainframe JCL and in-stream data sets. Processing instructions and input are combined in a single file. Used all the time for compiling, running utilities etc.

I'm guessing that this (IBM) example is setting the delimeter to '@@' to avoid problems with the comment - JCL also understands the '/*' sequence. I've not seen it used with other languages (Cobol etc.)

    //jobname    JOB   acctno,name...
    //COMPILE    EXEC  PGM=CCNDRVR,
    // PARM='/SEARCH(''CEE.SCEEH.+'') NOOPT SO OBJ'
    //STEPLIB    DD    DSNAME=CEE.SCEERUN,DISP=SHR
    //           DD    DSNAME=CEE.SCEERUN2,DISP=SHR
    //           DD    DSNAME=CBC.SCCNCMP,DISP=SHR
    //SYSLIN     DD    DSNAME=MYID.MYPROG.OBJ(MEMBER),DISP=SHR
    //SYSPRINT   DD    SYSOUT=*
    //SYSIN      DD    DATA,DLM=@@
      #include <stdio.h>
      ⋮
      int main(void)
      {
      /*  comment   */
      ⋮
      }
    @@
    //SYSUT1     DD    DSN=...
    ⋮
    //*
https://en.wikipedia.org/wiki/Job_Control_Language#In-stream...*
By @necovek - 4 months
The way I see it, this is something a true built-from-source system could do with their packaging system to enable no-effort code changes for any system utility and true trust in you running what you have source for (other than backdoored hardware).

Debian is pretty far off from this vision (if we also want performant execution), but I wonder how do the Gentoo, ArchLinux and Nix fare in this regard? Is this something that could be viably built with their current packaging formats?

By @o11c - 4 months
Sometimes it's useful to do something like:

  /*usr/bin/env echo 'Hello World!' #*/
Even if, for some reason, there are multiple `usr` folders, the use of `env` means it will eventually call the executable.

As for getting rid of the shebang - swapping the ! with a / means that the line and character counts don't change so you get meaningful error messages.

By @hippich - 4 months
Somewhat adjacent- I recently discovered https://github.com/rofl0r/rcb2 - it can take it quite far without using make file. And similarly to OP - it allows to keep relevant build info right in the source code. (Rcb2 is great at prototype stage, but obviously at some point makefiles are worth spending time on)
By @slippy - 4 months
Didn't this exist conceptually anyway as the C shell (csh) where the scripting language was "closer to" C?

https://en.wikipedia.org/wiki/C_shell

It seems like you are on your way to making the C++ shell.

By @trollied - 4 months
You might all enjoy The International Obfuscated C Code Contest https://www.ioccc.org/

https://www.ioccc.org/years.html

By @dorianmariefr - 4 months

    "$0".bin: -c: line 0: unexpected EOF while looking for matching `''
    "$0".bin: -c: line 1: syntax error: unexpected end of file
By @z4ziggy - 4 months
//bin/env gcc $0 -g -o ${0%.} && ./${0%.} ; exit

thats the one i've been using. feel free to adopt and change.