Weird things I learned while writing an x86 emulator
The article explores writing an x86 and amd64 emulator for Time Travel Debugging, emphasizing x86 encoding, prefixes, flag behaviors, shift instructions, segment overrides, FS and GS segments, TEB structures, CPU configuration, and segment handling nuances in 32-bit and 64-bit modes.
Read original articleThe article discusses the author's experience writing an x86 and amd64 emulator for Time Travel Debugging, focusing on the peculiarities of x86 encoding, instruction prefixes, flag behaviors, shift instructions, and segment overrides. It highlights how x86 instructions can have multiple encodings for the same operation, the impact of prefixes on instruction behavior, and the nuances of flag settings by different instructions. The author also delves into the intricacies of shift instructions, segment overrides in 32-bit and 64-bit code, and the use of FS and GS segments for thread local storage. Additionally, the article touches on accessing TEB structures using FS and GS registers, the role of CPU configuration in determining segment base addresses, and the differences in segment handling between 32-bit and 64-bit modes. The discussion provides insights into the complexities of CPU emulation and the detailed understanding required to effectively write a CPU emulator.
Related
Investigating SSMEC's (State Micro) 486s with the UCA
An investigation into State Microelectronics Co. Ltd.'s SM486 CPUs reveals they closely mimic Intel's 486 CPUs but may use more advanced processes. Questions arise about their legality and potential applications in China's vintage microprocessor market.
The Byte Order Fiasco
Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.
A blast from the past: Disassembling DOS (2020)
The text explores disassembling MS-DOS, focusing on INT 21h functions and dissecting files like IO.SYS. It discusses reverse engineering, legal aspects, and the microkernel nature of DOS for deeper insights.
Do not taunt happy fun branch predictor
The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.
Re-visiting VM/386 (2023)
The author shares experiences with VM/386, an emulation software from 1988. Running on 86box, it enables multitasking on an 80386 processor with graphical PC programs, despite limitations hindering broader success.
Encoding: People often complain about prefixes, but IMHO, that's by far not the worst thing. It is well known and somewhat well documented. There are worse quirks: For example, REX/VEX/EVEX.RXB extension bits are ignored when they do not apply (e.g., MMX registers); except for mask registers (k0-k7), where they trigger #UD -- also fine -- except if the register is encoded in ModRM.rm, in which case the extension bit is ignored again.
APX takes the number of quirks to a different level: the REX2 prefix can encode general-purpose registers r16-r31, but not xmm16-xmm31; the EVEX prefix has several opcode-dependent layouts; and the extension bits for a register used depend on the register type (XMM registers use X3:B3:rm and V4:X3:idx; GP registers use B4:B3:rm, X4:X3:idx). I can't give a complete list yet, I still haven't finished my APX decoder after a year...
The closest I've ever come to something like OP (which is to say, not close at all) was when I was trying to help my JS friend understand the stack, and we ended up writing a mini vm with its own little ISA: https://gist.github.com/darighost/2d880fe27510e0c90f75680bfe...
This could have gone much deeper - i'd have enjoyed that, but doing so would have detracted from the original educational goal lol. I should contact that friend and see if he still wants to study with me. it's hard since he's making so much money doing fancy web dev, he has no time to go deep into stuff. whereas my unemployed ass is basically an infinite ocean of time and energy.
The docs are an amazing tour of how the cpu works.
Cannot believe it’s been 16months. How time flies.
Hard disagree.
The best way is to create a CPU from gate level, like you do on a decent CS course. (I really enjoyed making a cut down ARM from scratch)
[1] Namely, a version of Fabian Giesen's disfilter for x86-64, for yet another side project which is still not in public: https://gist.github.com/lifthrasiir/df47509caac2f065032ef72e...
The 68k disassembler we wrote in college was such a Neo “I know kung fu” moment for me. It was the missing link that let me reason about code from high-level language down to transistors and back. I can only imagine writing a full emulator is an order of magnitude more effective. Great article!
While at a startup when we were looking at data at rest encryption, streaming encryption and other such things. Dan had a page with different implementations (cross compiled from his assembler representation) to target chipsets and instruction sets. Using VMs (this was the early/mid 2000s) and such, it was interesting to see what of those instruction sets were supported. In testing, there would be occasional hiccups where an implementation wasn't fully supported though the VM claimed such.
For this use-case, x86 is really easy to analyze whereas MIPS has been a nightmare to pull off. This is because all I mostly care about are references to code and data. x86 has pointer-sized immediate constants and MIPS has split HI16/LO16 relocation pairs, which leads to all sorts of trouble with register usage graphs, code flow and branch delay instructions.
That should not be constructed as praise on my end for x86.
Implementing a ThingDoer is a huge learning experience. I remember doing co-op "write-a-compiler" coursework with another person. We were doing great, everything was working and then we got to the oral exam...
"Why is your Stack Pointer growing upwards"?
... I was kinda stunned. I'd never thought about that. We understood most of the things, but sometimes we kind of just bashed at things until they worked... and it turned out upward-growing SP did work (up to a point) on the architecture our toy compiler was targeting.
Related
Investigating SSMEC's (State Micro) 486s with the UCA
An investigation into State Microelectronics Co. Ltd.'s SM486 CPUs reveals they closely mimic Intel's 486 CPUs but may use more advanced processes. Questions arise about their legality and potential applications in China's vintage microprocessor market.
The Byte Order Fiasco
Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.
A blast from the past: Disassembling DOS (2020)
The text explores disassembling MS-DOS, focusing on INT 21h functions and dissecting files like IO.SYS. It discusses reverse engineering, legal aspects, and the microkernel nature of DOS for deeper insights.
Do not taunt happy fun branch predictor
The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.
Re-visiting VM/386 (2023)
The author shares experiences with VM/386, an emulation software from 1988. Running on 86box, it enables multitasking on an 80386 processor with graphical PC programs, despite limitations hindering broader success.