Emulating Ghidra's PCODE

Ryan Torvik - March 24, 2023 - 5 minute read

Tulip Tree Technology

Ghidra’s PCODE is a register transfer language that describes precisely how a machine code instruction modifies the registers and memory of a computing system. It is the fundamental technology underneath Ghidra’s binary analysis tools.

At Tulip Tree Technology, we use PCODE as the primary instruction set for our emulators. As we will see, PCODE is what makes our strategic runtime analysis possible. Our full-system emulator Emerson uses Ghidra’s C++ library to translate raw bytes into PCODE operations. In this blog, we feature our instruction set simulator Hawthorne which uses our own Rust version of this library to do all of the translation and execution in the browser.

By using PCODE, our emulators will function the same on any architecture supported by Ghidra. This means we can reuse our taint-tracking algorithm without having to tweak it to handle different architectures.

PCODE’s most valuable feature for us is that a single operation has a single output. Let’s look at an example from x86. Navigate to Hawthorne tuliptreetech.com/hawthorne, put the bytes 83ce0a in, select x86 from the list of architectures, and see what happens.

Starting Hawthorne

Go!

Loading Hawthorne

Hawthorne has loaded the bytes into memory and begins execution at address 0x0. The next instruction it’s going to execute is a SUB ESP,0xa.

Using the command ui 0 1 (disassemble one instruction at address 0) gives you this.

Disassembling

Here you can see that this simple SUB instruction is significantly more complicated than just subtracting 10 from ESP. Computing all the flags and doing the subtraction is actually 9 separate PCODE operations. Let’s pull that apart.

Examining the value of the CF register shows that it is zero.

Examining CF

Let’s execute this instruction and see what happens.

Stepping

After executing this instruction, you’ll notice that 10 has been subtracted from ESP and EIP has been incremented by 3 bytes, the size of the SUB instruction. Examining the CF register shows that we had to do a carry in order to process the subtraction. The PF register is also set. Any future conditional branch can take those into account.

Examining the results

NOTE: An eagle eyed reverser may notice that CF has changed, but our display of the eflags register has not. Ghidra’s processor specification language SLEIGH doesn’t allow for values smaller than a byte. We need to pack and unpack the flags/eflags/rflags register in our registers display so it accurately reflects the system state. The PCODE operations all use the individual flag registers so are not affected by what eflags actually is.

While our emulators execute an entire instruction on a step, they individually execute each PCODE operation. Each operation changes a single value on the system, which makes taint tracking relatively simple. Emerson’s current algorithm evaluates the inputs to an operation. If any of the bytes of any of the inputs are in the taint tracker, put the corresponding output byte in the taint tracker. If none of the inputs are in there, remove the output.

More will be done in the future to provide more metadata about the tracked data as it moves through the system.

This example has been done in our instruction set simulator Hawthorne. PCODE comes with a piece of functionality called pcodeop or CALLOTHER which implement the range of processor level system calls that are necessary for full-system emulation. Our full-system emulator Emerson takes advantage of these operations to handle CPU exceptions and the like. We’ll discuss our implementation of these some other time.

Ghidra’s register transfer language PCODE is an extremely valuable tool for vulnerability research. Leveraging PCODE, Tulip Tree Technology has created state of the art emulators to give our customers the visibility and assurance that the data being processed in their systems is being used appropriately.

Previous Post Next Post

Tulip Tree Technology, Learn Deep, Dream Big.

© 2024 Tulip Tree Technology