x86 CPUs usually identify themselves and their features using the
instruction. But even without looking at
their self-reported identities or timing behavior, it is possible to tell CPU
Take for example the
ud0 instruction. This instruction is used to generate an
Invalid Opcode Exception (#UD). It is
encoded with the two bytes
If we place this instruction at the end of an executable page in memory and the
following page is not executable, we see differences across x86
microarchitectures. On my Goldmont
NUC, executing this
instruction will indeed cause an #UD exception. On Linux, this exception is
If I retry the same setup on my
the result is a
SIGSEGV instead. This signal is caused by a page fault during
instruction fetch. This means that the CPU did not manage to decode this
instruction with just the two bytes and tried to fetch more bytes. My somewhat
laptop has the same behavior.
Using baresifter, we can reverse engineer
(more on that in a future blog post) that Skylake and Broadwell actually try to
ud0 as if it had source and destination operands. After the the two
opcode bytes, they expect a ModR/M
byte and as many
additional immediate or displacement bytes as the ModR/M byte indicate.
I have put the code for this example on Github.
Why would this matter? Afterall, this behavior is now even documented in the Intel Software Developer’s Manual:
Some older processors decode the UD0 instruction without a ModR/M byte. As a result, those processors would deliver an invalid-opcode exception instead of a fault on instruction fetch when the instruction with a ModR/M byte (and any implied bytes) would cross a page or segment boundary.
I have picked an easy example for this post. Beyond this documented difference,
there are many other undocumented differences in instruction fetch behavior for
other illegal opcodes that makes it fairly easy to figure out what
microarchitecture we are dealing with. This still applies when a hypervisor
cpuid and changes the (virtual) CPU’s self-reported identity. It is
also possible to fingerprint different x86 instruction decoding libraries using
this approach and narrow down which hypervisor software stack is used.
One usecase I can think of is to build malware that is tailored to recognize its target using instruction fetch fingerprinting. Let’s say the malware’s target is an embedded system with an ancient x86 CPU. If it is actively fingerprinting the CPU, it can avoid deploying its payload in an automated malware anlysis system and be discovered, unless the malware analysis is performed on the exact same type of system targeted by the malware.