October 6th, 2024

AVX Bitwise ternary logic instruction busted

The blog post examines the AVX-512 vpternlogd instruction for complex Boolean logic operations, comparing it to the Amiga blitter chip and providing methods for calculating minterm values.

Read original articleLink Icon
SatisfactionCuriosityAppreciation
AVX Bitwise ternary logic instruction busted

The blog post discusses the AVX-512 instruction set architecture, specifically focusing on the vpternlogd instruction, which performs bitwise ternary logic operations using three input sources. This instruction allows for complex Boolean logic to be executed in a single command, processing 512 bits at once. The author draws a parallel between this modern instruction and the 1985 Amiga blitter chip, which also utilized an 8-bit value to control logical operations among three bitmap sources. The post highlights the challenges programmers faced in calculating the minterm values for the Amiga blitter, often relying on common values rather than understanding the underlying logic. The author provides a method for calculating these values, which can also be applied to the vpternlogd instruction, making it easier for programmers to define complex logical functions. The post concludes with a humorous observation about the potential influence of retro computing on modern Intel documentation, particularly regarding the choice of example values.

- The vpternlogd instruction in AVX-512 allows complex Boolean logic operations using three inputs.

- The instruction processes data in 512-bit registers, enhancing computational efficiency.

- The author compares the vpternlogd instruction to the Amiga blitter chip, which also used an 8-bit value for logical operations.

- A method for calculating minterm values is provided, applicable to both the Amiga blitter and modern AVX instructions.

- The post humorously suggests a retro influence in Intel's documentation choices.

AI: What people are saying
The comments on the blog post about the AVX-512 vpternlogd instruction reveal several key themes and insights from readers.
  • Many commenters appreciate the connection between the AVX instruction and historical hardware like the Amiga blitter chip, sharing personal experiences and nostalgia.
  • There is a discussion about the practicality and implementation of the instruction in compilers, with some questioning whether compilers can effectively utilize it.
  • Several users highlight the concept of using lookup tables for Boolean operations, drawing parallels to FPGAs and other technologies.
  • Some commenters clarify the terminology around "ternary logic," noting that it typically refers to three truth values, while the instruction handles binary logic with three inputs.
  • Overall, the article is well-received, with many expressing gratitude for the informative content.
Link Icon 27 comments
By @mmozeiko - 3 months
There is a simple way to get that immediate from expression you want to calculate. For example, if you want to calculate following expression:

    (NOT A) OR ((NOT B) XOR (C AND A))
then you simply write

    ~_MM_TERNLOG_A | (~_MM_TERNLOG_B ^ (_MM_TERNLOG_C & _MM_TERNLOG_A))
Literally the expression you want to calculate. It evaluates to immediate from _MM_TERNLOG_A/B/C constants defined in intrinsic headers, at least for gcc & clang:

    typedef enum {
      _MM_TERNLOG_A = 0xF0,
      _MM_TERNLOG_B = 0xCC,
      _MM_TERNLOG_C = 0xAA
    } _MM_TERNLOG_ENUM;
For MSVC you define them yourself.
By @Sniffnoy - 3 months
Oh, I thought the title was saying that the instruction doesn't work properly! (The article actually just explains how it works.)
By @Lerc - 3 months
My teenage self did not write "CRAP!" on that page of the hardware manual, but I stared at it for so long trying to figure it out.

In the end I did what pretty much everyone else did, Found the BLTCON0 for Bobs and straight copies and then pretended I newer saw the thing.

I did however get an A+ in computational logic at university years later, so maybe some of the trauma turned out to be beneficial.

By @cubefox - 3 months
About the title: "Ternary logic" usually means "logic with three truth values". But this piece covers a compiler instruction which handles all binary logic gates with three inputs.
By @red_admiral - 3 months
Is this similar to the Windows (since at least 3.1 I think?) BitBlt function, that takes an `op` parameter to decide how to combine the source, destination and mask?

I remember there are names for some of the codes like BLACKNESS for producing black whatever the inputs are, COPY (or something like that) to just copy the source to the destination etc. I always thought BLACKNESS and WHITENESS had a kind of poetic ring to them.

As far as I know, I think this is from Petzold, it's implemented in software but the opcode is actually converted to custom assembly inside the function when you call it, a rare example of self-modifying code in the Windows operating system.

By @kens - 3 months
I'll point out that this is the same way that FPGAs implement arbitrary logic functions, as lookup tables (LUTs).
By @anon2024user - 3 months
Head over to https://www.sandpile.org, and find VPTERNLOG on the 3-byte opcode page https://www.sandpile.org/x86/opc_3.htm and you will not only see Intel's apparent past plan for the variants with byte and word masking (AVX512BITALG2), but also the links from the Ib operand to the ternary logic table page https://www.sandpile.org/x86/ternlog.htm with all 256 cases.
By @abecedarius - 3 months
Re the choice of function "E2" for the example in the docs: it's sort of the most basic, canonical boolean function on 3 inputs, named mux: A if B else C. It's universal -- you don't need to be an Amiga fan to pick it (though for all I know they might've been).
By @fallingsquirrel - 3 months
Another example of packing bitwise ops into an integer is win32's GDI ROP codes: https://learn.microsoft.com/en-us/windows/win32/gdi/ternary-...
By @Findecanor - 3 months
I didn't have the official Amiga hardware manual, but instead the book "Mapping the Amiga". It said the same thing in a slight more verbose way. I don't remember which minterms I used back then but I think I managed to work things out from this book to do shadebobs, bobs, XOR 3D line drawing and other things.

The page in Mapping the Amiga: https://archive.org/details/1993-thomson-randy-rhett-anderso...

By @leogao - 3 months
Nvidia SASS has a similar instruction too (LOP3.LUT)
By @worstspotgain - 3 months
It's nice that they're finally starting to "compress" the instruction space.

To take a related concept further, it would be nice if there were totally unportable, chip-superspecific ways of feeding uops directly, particularly with raw access to the unrenamed register file.

Say you have an inner loop, and a chip is popular. Let your compiler take a swing at it. If it's way faster than the ISA translation, add a special case to the fat binary for a single function.

Alas, it will probably never happen due to security, integrity, and testing costs.

By @unwind - 3 months
As someone who fits the description rather too well (although neither my teenage or current self would ever use a marker in the Hardware Reference, omg) this was really nice and satisfying to read.

In a weird sense it kind of helped me feel that, yes, I would probably understand stuff better if I tried re-learning the Amiga hardware today and also like I got a bit of it for free already! Is there such a thing as being protected from a nerd snipe? "This article was my nerd trench" ... or something. Thanks! :)

By @pwrrr - 3 months
Holy cow. I remember reading that page in the Amiga reference manual, thinking it was utter crap and made up my own way of calculating the value (which worked, lol).
By @makapuf - 3 months
In fact that means that there is a dedicated AVX instruction for Elementary cellular automaton (https://en.wikipedia.org/wiki/Elementary_cellular_automaton).
By @ChuckMcM - 3 months
This is an instruction I would like to implement in RISC-V if it isn't already, (which yeah, I know, isn't very RISC like)

   movei (%r1),(%r2),(%r3),value
Move the contents of memory pointed to by r1, to the contents of memory pointed to by r2, applying the boolean operator <value>, with the memory pointed to by r3. Then increment all three registers by 4 to point to the next word. There was something similar to this in the Intel 82786 graphics chip which had a sort of minimal cpu part that could run simple "programs".

And yeah, I really enjoyed the blitter on the Amiga. It was a really cool bit of hardware.

By @notfed - 3 months
Couldn't every Boolean operation be "busted" as a lookup table?
By @londons_explore - 3 months
Do compilers actually output this instruction?

So many super-clever instructions are next to impossible for compilers to automatically use.

By @pjmlp - 3 months
> The Amiga blitter user manual didn’t help much either. The “Amiga Hardware Reference Manual” from 1989 tried to explain minterm calculation using confusing symbols, which frustrated many young demo makers at the time.

That is super normal logical calculus that any worthwhile CS degree teaches about.

Granted, probably not what a teenager without access to a BBS, or Aminet, would be able to figure out.

By @transfire - 3 months
Great little article! Thank you.
By @ggerules - 3 months
It looks like someone paid attention in their undergraduate Discrete Math class.
By @stevefan1999 - 3 months
If you want to calculate the minterms why don't you just get a K-Map?
By @486sx33 - 3 months
it’s fundamentally just a lookup table
By @hvenev - 3 months
> an obscure instruction

Come on, vpternlog* is not obscure. It subsumes _all_ bitwise instructions, even loading the constant (-1) into a register.