July 10th, 2024

Weird things I learned while writing an x86 emulator

The article explores writing an x86 and amd64 emulator for Time Travel Debugging, emphasizing x86 encoding, prefixes, flag behaviors, shift instructions, segment overrides, FS and GS segments, TEB structures, CPU configuration, and segment handling nuances in 32-bit and 64-bit modes.

Read original articleLink Icon
Weird things I learned while writing an x86 emulator

The article discusses the author's experience writing an x86 and amd64 emulator for Time Travel Debugging, focusing on the peculiarities of x86 encoding, instruction prefixes, flag behaviors, shift instructions, and segment overrides. It highlights how x86 instructions can have multiple encodings for the same operation, the impact of prefixes on instruction behavior, and the nuances of flag settings by different instructions. The author also delves into the intricacies of shift instructions, segment overrides in 32-bit and 64-bit code, and the use of FS and GS segments for thread local storage. Additionally, the article touches on accessing TEB structures using FS and GS registers, the role of CPU configuration in determining segment base addresses, and the differences in segment handling between 32-bit and 64-bit modes. The discussion provides insights into the complexities of CPU emulation and the detailed understanding required to effectively write a CPU emulator.

Related

Investigating SSMEC's (State Micro) 486s with the UCA

Investigating SSMEC's (State Micro) 486s with the UCA

An investigation into State Microelectronics Co. Ltd.'s SM486 CPUs reveals they closely mimic Intel's 486 CPUs but may use more advanced processes. Questions arise about their legality and potential applications in China's vintage microprocessor market.

The Byte Order Fiasco

The Byte Order Fiasco

Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.

A blast from the past: Disassembling DOS (2020)

A blast from the past: Disassembling DOS (2020)

The text explores disassembling MS-DOS, focusing on INT 21h functions and dissecting files like IO.SYS. It discusses reverse engineering, legal aspects, and the microkernel nature of DOS for deeper insights.

Do not taunt happy fun branch predictor

Do not taunt happy fun branch predictor

The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.

Re-visiting VM/386 (2023)

Re-visiting VM/386 (2023)

The author shares experiences with VM/386, an emulation software from 1988. Running on 86box, it enables multitasking on an 80386 processor with graphical PC programs, despite limitations hindering broader success.

Link Icon 22 comments
By @aengelke - 5 months
Bonus quirk: there's BSF/BSR, for which the Intel SDM states that on zero input, the destination has an undefined value. (AMD documents that the destination is not modified in that case.) And then there's glibc, which happily uses the undocumented fact that the destination is also unmodified on Intel [1]. It took me quite some time to track down the issue in my binary translator. (There's also TZCNT/LZCNT, which is BSF/BSR encoded with F3-prefix -- which is silently ignored on older processors not supporting the extension. So the same code will behave differently on different CPUs. At least, that's documented.)

Encoding: People often complain about prefixes, but IMHO, that's by far not the worst thing. It is well known and somewhat well documented. There are worse quirks: For example, REX/VEX/EVEX.RXB extension bits are ignored when they do not apply (e.g., MMX registers); except for mask registers (k0-k7), where they trigger #UD -- also fine -- except if the register is encoded in ModRM.rm, in which case the extension bit is ignored again.

APX takes the number of quirks to a different level: the REX2 prefix can encode general-purpose registers r16-r31, but not xmm16-xmm31; the EVEX prefix has several opcode-dependent layouts; and the extension bits for a register used depend on the register type (XMM registers use X3:B3:rm and V4:X3:idx; GP registers use B4:B3:rm, X4:X3:idx). I can't give a complete list yet, I still haven't finished my APX decoder after a year...

[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=31748

By @sdsd - 5 months
What a cool person. I really enjoy writing assembly, it feels so simple and I really enjoy the vertical aesthetic quality.

The closest I've ever come to something like OP (which is to say, not close at all) was when I was trying to help my JS friend understand the stack, and we ended up writing a mini vm with its own little ISA: https://gist.github.com/darighost/2d880fe27510e0c90f75680bfe...

This could have gone much deeper - i'd have enjoyed that, but doing so would have detracted from the original educational goal lol. I should contact that friend and see if he still wants to study with me. it's hard since he's making so much money doing fancy web dev, he has no time to go deep into stuff. whereas my unemployed ass is basically an infinite ocean of time and energy.

By @AstroJetson - 5 months
Check out Justine Tunney and her emulator. https://justine.lol/blinkenlights/

The docs are an amazing tour of how the cpu works.

By @pm2222 - 5 months
Prior discussion here https://news.ycombinator.com/item?id=34636699

Cannot believe it’s been 16months. How time flies.

By @trollied - 5 months
> Writing a CPU emulator is, in my opinion, the best way to REALLY understand how a CPU works

Hard disagree.

The best way is to create a CPU from gate level, like you do on a decent CS course. (I really enjoyed making a cut down ARM from scratch)

By @dmitrygr - 5 months
I've written fast emulators for a dozen non-toy architectures and a few JIT translators for a few as well. x86 still gives me PTSD. I have never seen a messier architecture. There is history, and a reason for it, but still ... damn
By @lifthrasiir - 4 months
I recently implemented a good portion of x86(-64) decoder for some side project [1] and kinda surprised how it got even more complicated in recent days. Sandpile.org [2] was really useful for my purpose.

[1] Namely, a version of Fabian Giesen's disfilter for x86-64, for yet another side project which is still not in public: https://gist.github.com/lifthrasiir/df47509caac2f065032ef72e...

[2] https://sandpile.org/

By @t_sea - 4 months
> Writing a CPU emulator is, in my opinion, the best way to REALLY understand how a CPU works.

The 68k disassembler we wrote in college was such a Neo “I know kung fu” moment for me. It was the missing link that let me reason about code from high-level language down to transistors and back. I can only imagine writing a full emulator is an order of magnitude more effective. Great article!

By @jmspring - 4 months
Apparently my memory is false, I thought originally the salsa20 variants and machine code were on cryp.to in my memory, but Dan Berstein's site is - https://cr.yp.to/

While at a startup when we were looking at data at rest encryption, streaming encryption and other such things. Dan had a page with different implementations (cross compiled from his assembler representation) to target chipsets and instruction sets. Using VMs (this was the early/mid 2000s) and such, it was interesting to see what of those instruction sets were supported. In testing, there would be occasional hiccups where an implementation wasn't fully supported though the VM claimed such.

By @boricj - 5 months
It's funny to me how much grief x86 assembly generates when compared to RISC here, because I have the opposite problem when delinking code back into object files.

For this use-case, x86 is really easy to analyze whereas MIPS has been a nightmare to pull off. This is because all I mostly care about are references to code and data. x86 has pointer-sized immediate constants and MIPS has split HI16/LO16 relocation pairs, which leads to all sorts of trouble with register usage graphs, code flow and branch delay instructions.

That should not be constructed as praise on my end for x86.

By @ale42 - 4 months
Shouldn't it be (2023) rather than (2013)?
By @fjfaase - 5 months
Interesting read. I have a lot of respect for people who develop emulator for x86 processors. It is a complicated processor and from first hand experience I know that developing and debugging emulators for CPU's can be very challenging. In the past year, I spend some time developing a very limited i386 emulator [1] including some system calls for executing the first steps of live-bootstrap [2], primarily to figure out how it is working. I learned a lot about system calls and ELF.

[1] https://github.com/FransFaase/Emulator/

[2] https://github.com/fosslinux/live-bootstrap/

By @SunlitCat - 5 months
Haha! Writing an x86 emulator! I still remember writing a toy emulator which was able to execute something around the first 1000-ish lines of a real bios (and then it stuck or looped when it started to access ports or so, can't remember it was too long ago and I didn't continue it as I started to get into DirectX and modern c++ more).
By @waynecochran - 5 months
Intel architecture is loaded with historical artifacts. The switch in how segment registers were used as you went from real mode to protected mode was an incredible hardware hack to keep older software working. I blame Intel for why so many folks avoid assembly language. I programmed in assembly for years using TI's 84010 graphics chips and the design was gorgeous -- simple RISC instruction set, flat address space, and bit addressable! If during the earlier decades folks were programming using chips with more elegant designs, far more folks would be programming in assembly language (or at least would know how to).
By @Sparkyte - 4 months
The footnotes are glorious. "He was convinced that using a shift would work and didn’t believe me when I said it wasn’t possible."
By @was_a_dev - 5 months
Off topic, but I like this blog style/layout. I can imagine it isn't everyones taste, but it just works for me
By @djaouen - 4 months
“I don’t believe in emulatores.” - 0x86
By @Quekid5 - 5 months
Just as an adjacent aside from a random about learning by doing:

Implementing a ThingDoer is a huge learning experience. I remember doing co-op "write-a-compiler" coursework with another person. We were doing great, everything was working and then we got to the oral exam...

"Why is your Stack Pointer growing upwards"?

... I was kinda stunned. I'd never thought about that. We understood most of the things, but sometimes we kind of just bashed at things until they worked... and it turned out upward-growing SP did work (up to a point) on the architecture our toy compiler was targeting.