Emulating Ghidra's PCODE
Ryan Torvik - March 24, 2023 - 5 minute read
Ghidra’s PCODE is a register transfer language that describes precisely how a machine code instruction modifies the registers and memory of a computing system. It is the fundamental technology underneath Ghidra’s binary analysis tools.
At Tulip Tree Technology, we use PCODE as the primary instruction set for our emulators. As we will see, PCODE is what makes our strategic runtime analysis possible. Our full-system emulator Emerson uses Ghidra’s C++ library to translate raw bytes into PCODE operations. In this blog, we feature our instruction set simulator Hawthorne which uses our own Rust version of this library to do all of the translation and execution in the browser.
By using PCODE, our emulators will function the same on any architecture supported by Ghidra. This means we can reuse our taint-tracking algorithm without having to tweak it to handle different architectures.
PCODE’s most valuable feature for us is that a single operation has a single output. Let’s look at an example from x86. Navigate to Hawthorne tuliptreetech.com/hawthorne, put the bytes 83ce0a
in, select x86 from the list of architectures, and see what happens.
Go!
Hawthorne has loaded the bytes into memory and begins execution at address 0x0
. The next instruction it’s going to execute is a SUB ESP,0xa.
Using the command ui 0 1
(disassemble one instruction at address 0
) gives you this.
Here you can see that this simple SUB instruction is significantly more complicated than just subtracting 10 from ESP. Computing all the flags and doing the subtraction is actually 9 separate PCODE operations. Let’s pull that apart.
- The carry flag (
CF
) is set if the original ESP, as an unsigned integer, is less than 10. - The overflow flag (
OF
) is set if subtracting 10 from the original ESP would produce a signed borrow. ESP
is set toESP
minus 10.- The signed flag (
SF
) is set if the new ESP, as a signed integer, is less than 0. - The zero flag (
ZF
) is set if the new ESP equals 0. - The parity flag (
PF
) is calculated by several operations to determine if the number of bits set is even or odd. An even number of bits setsPF
to1
. An odd number sets it to0
.
Examining the value of the CF register shows that it is zero.
Let’s execute this instruction and see what happens.
After executing this instruction, you’ll notice that 10 has been subtracted from ESP and EIP has been incremented by 3 bytes, the size of the SUB instruction. Examining the CF register shows that we had to do a carry in order to process the subtraction. The PF register is also set. Any future conditional branch can take those into account.
NOTE: An eagle eyed reverser may notice that CF
has changed, but our display of the eflags register has not. Ghidra’s processor specification language SLEIGH doesn’t allow for values smaller than a byte. We need to pack and unpack the flags
/eflags
/rflags
register in our registers display so it accurately reflects the system state. The PCODE operations all use the individual flag registers so are not affected by what eflags actually is.
While our emulators execute an entire instruction on a step, they individually execute each PCODE operation. Each operation changes a single value on the system, which makes taint tracking relatively simple. Emerson’s current algorithm evaluates the inputs to an operation. If any of the bytes of any of the inputs are in the taint tracker, put the corresponding output byte in the taint tracker. If none of the inputs are in there, remove the output.
More will be done in the future to provide more metadata about the tracked data as it moves through the system.
This example has been done in our instruction set simulator Hawthorne. PCODE comes with a piece of functionality called pcodeop
or CALLOTHER
which implement the range of processor level system calls that are necessary for full-system emulation. Our full-system emulator Emerson takes advantage of these operations to handle CPU exceptions and the like. We’ll discuss our implementation of these some other time.
Ghidra’s register transfer language PCODE is an extremely valuable tool for vulnerability research. Leveraging PCODE, Tulip Tree Technology has created state of the art emulators to give our customers the visibility and assurance that the data being processed in their systems is being used appropriately.