RISC-V: The (Almost) Unused Bit in JALR
In the RISC-V architecture, you have excellent support for embedding information into code by choosing compressed or uncompressed instructions. While being a typical RISC with fixed 32-bit instruction length, RISC-V allows certain common instructions to be encoded as compressed 16-bit instructions to improve code density. Each compressed instruction has a functionally identical 32-bit cousin.
If you are interested in how that is used to embed information into a binary, you can check out my x86 instruction set steganography post from a couple of years ago, which uses a similar property of the x86 instruction set to do exactly this.
What I found more interesting, when reading the RISC-V User-Level
ISA
specification, is that the jalr
(“Jump and Link Register”)
instruction has an essentially unused bit that can be used to embed
information as well.
To see why this bit is essentially unused, consider how jalr
works. jalr
computes its jump target by adding an immediate value to
a source register. This immediate is unlike other jump immediates
not encoded as multiples of 2. The specification says that the
lowest bit of the sum is ignored and treated as zero. Since the source
register is practically always aligned and its lowest bit is zero,
this means that the lowest bit of the jalr
is ignored in practice.
That there is a unused bit in the instruction encoding is unusual.
Typically, all the available space is used to encode bigger
immediates. But for the jalr
instruction the RISC-V designers
decided to go for simplicity. Here is an excerpt from the
spec
(page 16):
Note that the JALR instruction does not treat the 12-bit immediate as multiples of 2 bytes, unlike the conditional branch instructions. This avoids one more immediate format in hardware. In practice, most uses of JALR will have either a zero immediate or be paired with a LUI or AUIPC, so the slight reduction in range is not significant.
The JALR instruction ignores the lowest bit of the calculated target address. This both simplifies the hardware slightly and allows the low bit of function pointers to be used to store auxiliary information. Although there is potentially a slight loss of error checking in this case, in practice jumps to an incorrect instruction address will usually quickly raise an exception.
The nice thing about this unused bit is that we can use it to embed information without changing the size of the instruction itself. This makes it more useful than selecting different-length encodings of the same instruction, because we can do so after compiling an application. Choosing different instruction sizes has to be done at compilation time, because it will shift around function addresses and jump targets.
Of course, this only works as long as no one is actually storing information in the low bit of function pointers. But this is rare in practice.
So how much information can we embed using this method? Let’s look at GCC as a medium-sized application. Let’s see how much we have to work with for a RISC-V 32-bit GCC:
$ readelf -l gcc
Elf file type is EXEC (Executable file)
Entry point 0x292bc
There are 11 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00010034 0x00010034 0x00160 0x00160 R 0x4
INTERP 0x000194 0x00010194 0x00010194 0x00075 0x00075 R 0x1
RISCV_ATTRIBUT 0x1948c6 0x00000000 0x00000000 0x00057 0x00000 R 0x1
> LOAD 0x000000 0x00010000 0x00010000 0x18e47c 0x18e47c R E 0x1000 <
LOAD 0x18eb38 0x0019fb38 0x0019fb38 0x05d7c 0x0a290 RW 0x1000
DYNAMIC 0x192ee8 0x001a3ee8 0x001a3ee8 0x00118 0x00118 RW 0x4
NOTE 0x00020c 0x0001020c 0x0001020c 0x00020 0x00020 R 0x4
TLS 0x18eb38 0x0019fb38 0x0019fb38 0x00000 0x00008 R 0x4
GNU_EH_FRAME 0x1566d4 0x001666d4 0x001666d4 0x07b34 0x07b34 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x18eb38 0x0019fb38 0x0019fb38 0x044c8 0x044c8 R 0x1
There are 0x18e47c
bytes of executable code (the LOAD
segment with
Execute permission). So there are roughly 1.5 MiB of code to work
with. Let’s see how much jalr
instructions we have:
$ objdump -Mno-aliases -d gcc | grep -E "[^.]jalr" | wc -l
190
There are 190 jalr
instructions in these 1.5 MiB of code. That means
we can embed 190 bits using this method into GCC. Not a lot. It turns
out that jalr
almost exclusively used for function entry stubs in
the
PLT. So
there is also no hope of orders of magnitude more in larger binaries.
If we use the obvious method of switching between compressed instructions and normal instructions in RISC-V we have much more to work with. Let’s count the compressed instructions in the GCC binary:
$ objdump -Mno-aliases -d gcc | grep -F "c." | wc -l
186412
That make makes 186412 bits of information (around 23 KiB). Much more useful!
Finally, why would you want to embed information into binaries? I can only think of contrived examples, but they are fun. Consider an air-gapped build system that produces signed binaries. You can only put source code in on one side and you get a signed binary out on the other side. An attacker that manages to exploit this system can covertly smuggle the signing key out by embedding it into the signed binaries itself!
Maybe it is time to insist on reproducible builds instead of air-gapped build systems. 😼