July 8th, 2024

Making Python Less Random

Andrew Healey detailed his debugging journey with a Python game prototype facing randomness issues. He used ptrace to control system calls, ensuring consistent results without changing the code, showcasing advanced system call tracing techniques.

Read original article

In July 2024, Andrew Healey shared his experience of debugging a Python game prototype with challenging randomness issues. He discovered multiple sources of randomness in his code and explored ways to achieve deterministic debugging without altering the code. By intercepting system calls using ptrace, he successfully controlled the output of getrandom calls, ensuring consistent results for os.urandom and random.randint functions. Through a detailed explanation of the process, he demonstrated how to manipulate system calls to provide fixed seeds for debugging purposes. By utilizing ptrace, Andrew effectively altered the behavior of the Python process without modifying the original code, showcasing a deep dive into system call tracing techniques. His exploration into system call tracing not only resolved the randomness issue but also sparked further interest in understanding the underlying mechanisms of tracing tools.

The Magic of Participatory Randomness

Randomness is vital in cryptography, gaming, and civic processes. Techniques like "Finger Dice" enable fair outcomes through participatory randomness, ensuring transparency and trust in provably fair games.

The good, the bad, and the weird (2018)

Trail of Bits delves into "weird machines" in software exploitation, complex code snippets evading security measures. Techniques like Hoare triples and dynamic_casts aid in identifying and preventing exploitation, crucial in evolving security landscapes.

The weirdest QNX bug I've ever encountered

The author encountered a CPU usage bug in a QNX system's 'ps' utility due to a 15-year-old bug. Debugging revealed a race condition, leading to code modifications and a shift towards open-source solutions.

Four lines of code it was four lines of code

The programmer resolved a CPU utilization issue by removing unnecessary Unix domain socket code from a TCP and TLS service handler. This debugging process emphasized meticulous code review and system interaction understanding.

Getting the World Record in Hatetris (2022)

David and Felipe set a world record in HATETRIS, a tough Tetris version. They used Rust, MCTS, and AlphaZero concepts to enhance gameplay, achieving a score of 66 points in 2021.

12 comments

By @fiddlerwoaroof - 10 months

Another way to do this that covers more sources of non-determinism would be to run your python code under Meta’s Hermit: https://developers.facebook.com/blog/post/2022/11/22/hermit-...

By @nbadg - 10 months

I'm... confused. Being able to intercept and modify syscalls is a neat trick, but why is it applicable here?

In python you generally have two kinds of randomness: cryptographically-secure randomness, and pseudorandomness. The general recommendation is: if you need a CSRNG, use ``os.urandom`` -- or, more recently, the stdlib ``secrets`` module. But if it doesn't need to be cryptographically secure, you should use the stdlib ``random`` module.

The thing is, the ``random`` module gives you the ability to seed and re-seed the underlying PRNG state machine. You can even create your own instances of the PRNG state machine, if you want to isolate yourself from other libraries, and then you can seed or reseed that state machine at will without affecting anything else. So for pseudorandom "randomness", the stdlib already exposes a purpose-built function that does exactly what the OP needs. Also, within individual tests, it's perfectly possible to monkeypatch the root PRNG in the random module with your own temporary copy, modify the seed, etc, so you can even make this work on a per-test basis, using completely bog-standard python, no special sauce required. Well-written libraries even expose this as a primitive for dependency injection, so that you can have direct control over the PRNG.

Meanwhile, for applications that require CSRNG... you really shouldn't be writing code that is testing for a deterministic result. At least in my experience, assuming you aren't testing the implementation of cryptographic primitives, there are always better strategies -- things like round-trip tests, for example.

So... are the 3rd-party deps just "misbehaving" and calling ``os.urandom`` for no reason? Does the OP author not know about ``random.seed``? Does the author want to avoid monkeypatches in tests (which are completely standard practice in python)? Is there something else going on entirely? Intercepting syscalls to get deterministic randomness in python really feels like bringing an atom bomb to a game of fingerguns.

By @red_admiral - 10 months

Maybe I'm missing something, but if you can set os.urandom to a custom function, why not implement your own stateful PRNG in python and patch urandom to point to that? Then you can, among other things, seed the PRNG yourself in unit tests, all from within python and without touching syscalls.

By @Neywiny - 10 months

Python randomness is something I've fought with for a few years. A while back (it's on my GitHub, I can find it if any replies care) I had an issue with something about distributed monte carlo sims all ending up with the same seed or something. More recently I've had an issue that I wanted a large number of random bytes but generated the same across multiple programs. Thinking about it now I could have used an LFSR or similar, but I just seeded the random module and it went fine.

Editing to add that another thing that trips me to every few years is that the hash function isn't repeatable between runs. Meaning if you run the program and record a hash of an object, then run it again, they'll be different. This is good for more secure maps and stuff but not good for thinking you can store them to a file and use them later.

By @Joker_vD - 10 months

Of course, this doesn't help with someone (e.g. me) who prefers to get their random numbers by reading them from /dev/random:

    $ strace python3 -c 'with open("/dev/random", "rb") as f: print(f.read(8))'
    [snip-snip]
    openat(AT_FDCWD, "/dev/random", O_RDONLY|O_CLOEXEC) = 3
    newfstatat(3, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x8), ...}, AT_EMPTY_PATH) = 0
    ioctl(3, TCGETS, 0x7ffd8198d640)        = -1 EINVAL (Invalid argument)
    lseek(3, 0, SEEK_CUR)                   = 0
    read(3, "\366m@\t5Q9\206\341\316/pXK\266\273~J\27\321:\34\330VL\253L\34\217\264L\373"..., 4096) = 4096
    write(1, "b'\\xf6m@\\t5Q9\\x86'\n", 19b'\xf6m@\t5Q9\x86'
    ) = 19
    close(3)                                = 0

There is also /dev/urandom.

By @ijustlovemath - 10 months

Cool deepdive into syscalls! We've built a deterministic simulator in Python to test the performance of our medical device under different scenarios, and have handled this problem with a few very simple approaches:

1. Run each simulation in its own process, using eg multiprocessing.Pool

2. Processes receive a specification for the simulation as a simple dictionary, one key of which is "seeds"

3. Seed the global RNGs we use (math.random and np.random) at the start of each simulation

4. For some objects, we seed the state separately from the global seeds, run the random generation, then save the RNG state to restore later so we can have truly independent RNGs

5. Spot check individual simulations by running them twice to ensure they have the same results (1/1000, but this is customizable)

This has worked very well for us so far, and is dead simple.

By @mianos - 10 months

This is utterly insane:

   import os
   os.urandom = lambda n: b'\x00' * n
   import random
   random.randint = lambda a, b: a

I love it!

By @xeyownt - 10 months

Maybe it doesn't fit completely the author needs but an even less intrusive way to control random is to seed it manually.

By @sltkr - 10 months

I know people hate “enterprise”-type software design, but this is a typical case where Dependency Injection would have made the solution trivial without the need for any OS-specific hacks.

And while the article serves as a nice introduction to ptrace(), I think as a solution to the posted problem it's strictly more complicated than just replacing the getrandom() implementation with LD_PRELOAD (which the author also mentions as an option). For reference, that can be done as follows:

    % cat getrandom.c 
    
    #include <string.h>
    #include <sys/types.h>
    
    ssize_t getrandom(void \*buf, size_t buflen, unsigned int flags) {
      memset(buf, 0, buflen);
      return buflen;
    }
    
    % cc getrandom.c -shared -o getrandom.so
    
    % LD_PRELOAD=./getrandom.so python3 -c 'import os; print(os.urandom(8))'
    b'\x00\x00\x00\x00\x00\x00\x00\x00'

Note that these solutions work slightly differently: ptrace() intercepts the getrandom() syscall, but LD_PRELOAD replaces the getrandom() implementation in libc.so (which normally invokes the getrandom() syscall on Linux).

By @cozzyd - 10 months

Rather than writing a program you can also just use gdb and do it interactively...

By @k_sze - 10 months

Just me or the solution will work on anything that depends on the SYS_getrandom syscall?

The Magic of Participatory Randomness

The good, the bad, and the weird (2018)

The weirdest QNX bug I've ever encountered

Four lines of code it was four lines of code

Getting the World Record in Hatetris (2022)

David and Felipe set a world record in HATETRIS, a tough Tetris version. They used Rust, MCTS, and AlphaZero concepts to enhance gameplay, achieving a score of 66 points in 2021.

Making Python Less Random

Related

The Magic of Participatory Randomness

The good, the bad, and the weird (2018)

The weirdest QNX bug I've ever encountered

Four lines of code it was four lines of code

Getting the World Record in Hatetris (2022)

Related

The Magic of Participatory Randomness

The good, the bad, and the weird (2018)

The weirdest QNX bug I've ever encountered

Four lines of code it was four lines of code

Getting the World Record in Hatetris (2022)