Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This book explores how different Rust constructs translate to ARM64/AArch64 assembly.

⚠🚧 The book is still under construction. New chapters will be added and the existing ones might be modified.

Since compilation involves multiple intermediate steps, we will trace through HIR, MIR and LLVM IR when it helps explain the final assembly output. We will only discuss the Rust frontend, not the LLVM backend. Still, for completeness, good documentation for the LLVM backend can be found here.

The Rust compiler overview can be found here.

Prerequisites

Rust compiler

$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

AArch64 musl libraries

$ rustup target install aarch64-unknown-linux-musl

Rust nightly toolchain

This is necessary because we will be using some nightly features.

$ rustup install nightly
$ rustup default nightly

Alternatively, just override the default toolchain in your working directory:

$ rustup override set nightly

Rust default config

$ export CARGO_BUILD_TARGET=aarch64-unknown-linux-musl
$ export CARGO_TARGET_AARCH64_UNKNOWN_LINUX_MUSL_LINKER=aarch64-linux-gnu-gcc
$ export CARGO_TARGET_AARCH64_UNKNOWN_LINUX_MUSL_RUNNER="qemu-aarch64"

Alternatively, add the following to .cargo/config.toml:

[build]
target = "aarch64-unknown-linux-musl"

[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-gnu-gcc"
runner = "qemu-aarch64"

AArch64 cross-compiler + binutils + sysroot

$ sudo dnf install gcc-aarch64-linux-gnu
$ sudo dnf install binutils-aarch64-linux-gnu
$ sudo dnf install sysroot-aarch64-fc41-glibc

rustfilt (optional)

If you need to manually demangle a symbol, rustfilt is very convenient:

$ cargo install rustfilt
$ echo _ZN8rust_lab4main17hf9a0ba7e2c977e69E | rustfilt
rust_lab::main

QEMU

$ sudo dnf install qemu-user

Ghidra

$ wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.3.2_build/ghidra_11.3.2_PUBLIC_20250415.zip

Test the cross-compilation setup

With a Cargo project:

$ cargo init
$ cargo build
$ cargo run --quiet
Hello, world!

Without a Cargo project:

$ echo 'fn main() { println!("Hello ARM64!"); }' > test.rs
$ rustc --target aarch64-unknown-linux-musl -C linker=aarch64-linux-gnu-gcc test.rs
$ qemu-aarch64 ./test
Hello ARM64!

References

Rust

ARM64

Hello, world!

Source

Initialize a new workspace with cargo init.

fn main() {
    println!("Hello, world!");
}

println! is a macro that wraps the _print function.

$ cargo rustc --release --quiet -- -Z unpretty=expanded
#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2024::*;
#[macro_use]
extern crate std;
fn main() { { ::std::io::_print(format_args!("Hello, world!\n")); }; }

Alternatively, you can see all macro expansions (including built-in ones) in the HIR:

$ cargo rustc --release --quiet -- -Z unpretty=hir
#[prelude_import]
use std::prelude::rust_2024::*;
#[macro_use]
extern crate std;
fn main() {
    { ::std::io::_print(format_arguments::new_const(&["Hello, world!\n"])); };
}

Build

$ cargo rustc --release

Ghidra

Load the binary into Ghidra and auto-analyze it.

rust_lab::main

As we saw above, println! is expanded to a _print call, which accepts an Arguments struct.

While reconstructing the Arguments type, the type size information is very useful. Note that the compiler might reorder the struct fields.

$ cargo rustc --release --quiet -- -Z print-type-sizes 
print-type-size type: `core::fmt::rt::Placeholder`: 48 bytes, alignment: 8 bytes
print-type-size     field `.precision`: 16 bytes
print-type-size     field `.width`: 16 bytes
print-type-size     field `.position`: 8 bytes
print-type-size     field `.flags`: 4 bytes
print-type-size     end padding: 4 bytes
print-type-size type: `std::fmt::Arguments<'_>`: 48 bytes, alignment: 8 bytes
print-type-size     field `.pieces`: 16 bytes
print-type-size     field `.args`: 16 bytes
print-type-size     field `.fmt`: 16 bytes
...
print-type-size type: `core::fmt::rt::Argument<'_>`: 16 bytes, alignment: 8 bytes
print-type-size     field `.ty`: 16 bytes
print-type-size type: `core::fmt::rt::ArgumentType<'_>`: 16 bytes, alignment: 8 bytes
print-type-size     variant `Placeholder`: 16 bytes
print-type-size         field `.value`: 8 bytes
print-type-size         field `.formatter`: 8 bytes
print-type-size         field `._lifetime`: 0 bytes
print-type-size     variant `Count`: 10 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.0`: 2 bytes, alignment: 2 bytes
print-type-size type: `core::fmt::rt::Count`: 16 bytes, alignment: 8 bytes
print-type-size     discriminant: 2 bytes
print-type-size     variant `Param`: 14 bytes
print-type-size         padding: 6 bytes
print-type-size         field `.0`: 8 bytes, alignment: 8 bytes
print-type-size     variant `Is`: 2 bytes
print-type-size         field `.0`: 2 bytes
print-type-size     variant `Implied`: 0 bytes
print-type-size type: `std::option::Option<&[core::fmt::rt::Placeholder]>`: 16 bytes, alignment: 8 bytes
print-type-size     variant `Some`: 16 bytes
print-type-size         field `.0`: 16 bytes
print-type-size     variant `None`: 0 bytes
...

The simplified Arguments type can be represented like this (explained in detail later). This is not valid C syntax of course, as &, [] or <> cannot be used in C struct names.

struct &[&str] {
    ptr64 ptr;
    usize len;
};

struct &[Argument] {
    ptr64 ptr;
    usize len;
};

struct Option<&[Placeholder]> {
    ptr64 ptr;
    usize len;
};

struct Arguments {
    struct &[&str] pieces;
    struct &[Argument] args;
    struct Option<&[Placeholder]> fmt;
};

Listing:

                             **************************************************************
                             * rust_lab::main                                             *
                             **************************************************************
                             undefined __rustcall main()
             undefined         <UNASSIGNED>   <RETURN>
             undefined8        Stack[-0x10]:8 local_10                                XREF[2]:     00401af4(W), 
                                                                                                   00401b1c(R)  
             Arguments         Stack[-0x40]   arguments                               XREF[1,2]:   00401b04(W), 
                                                                                                   00401b14(W), 
                                                                                                   00401b10(W)  
                             _ZN8rust_lab4main17hf9a0ba7e2c977e69E           XREF[3]:     main:00401b38(*), 00453c6c, 
                             rust_lab::main                                               004648a8(*)  
        00401af0 ff 03 01 d1     sub        sp,sp,#0x40
        00401af4 fe 1b 00 f9     str        x30,[sp, #local_10]
        00401af8 68 03 00 90     adrp       x8,0x46d000
        00401afc 08 61 1c 91     add        x8,x8,#0x718
        00401b00 29 00 80 52     mov        w9,#0x1
                             store pieces.ptr and pieces.len
        00401b04 e8 27 00 a9     stp        x8=>PTR_s_Hello,_world!_0046d718,x9,[sp]=>argu   = 0044c1a0
        00401b08 08 01 80 52     mov        w8,#0x8
                             move the struct address to the first argument
        00401b0c e0 03 00 91     mov        x0,sp
                             zero out args.len and fmt.ptr
        00401b10 ff ff 01 a9     stp        xzr,xzr,[sp, #arguments+0x18]
                             store args.ptr
        00401b14 e8 0b 00 f9     str        x8,[sp, #arguments.args.ptr]
        00401b18 da 63 00 94     bl         std::io::stdio::_print                           undefined _print()
        00401b1c fe 1b 40 f9     ldr        x30,[sp, #local_10]
        00401b20 ff 03 01 91     add        sp,sp,#0x40
        00401b24 c0 03 5f d6     ret

The logic is simple: it constructs an Arguments struct on the stack and passes the address of it via sp to the _print function.

Decompiled code (after creating the Arguments type in the Structure Editor and applying it in the code):

/* WARNING: Unknown calling convention: __rustcall */
/* rust_lab::main */

void __rustcall rust_lab::main(void)

{
  Arguments arguments;
  
                    /* store pieces.ptr and pieces.len */
  arguments.pieces.ptr = (ptr64)&PTR_s_Hello,_world!_0046d718;
  arguments.pieces.len = 1;
                    /* move the struct address to the first argument */
                    /* zero out args.len and fmt.ptr */
  arguments.args.len = 0;
  arguments.fmt.ptr = (ptr64)0x0;
                    /* store args.ptr */
  arguments.args.ptr = (ptr64)0x8;
  std::io::stdio::_print(&arguments);
  return;
}

From the Rust reference:

Though you should not rely on this, all pointers to DSTs are currently twice the size of the size of usize and have the same alignment.

In practice, this means that the fields of the struct Arguments are 16 bytes in memory: an 8 byte pointer and an 8 byte length. This is confirmed by the output of -Z print-type-sizes above.

pieces is a reference to a slice of str references (&str). In this case, pieces references only 1 &str which is also an 8 byte pointer and an 8 byte length.

                             PTR_s_Hello,_world!_0046d718                    XREF[1]:     main:00401b04(*)  
        0046d718 a0 c1 44        addr       s_Hello,_world!_0044c1a0                         = "Hello, world!\n"
                 00 00 00 
                 00 00
        0046d720 0e              ??         0Eh
        0046d721 00              ??         00h
        0046d722 00              ??         00h
        0046d723 00              ??         00h
        0046d724 00              ??         00h
        0046d725 00              ??         00h
        0046d726 00              ??         00h
        0046d727 00              ??         00h

args is a reference to a slice of Argument items and it references an empty slice now. Empty slices do not point to null but their size is 0. They point to valid addresses instead, depending on the alignment (8 bytes here).

print-type-size type: `core::fmt::rt::Argument<'_>`: 16 bytes, alignment: 8 bytes
print-type-size     field `.ty`: 16 bytes
fn main() {
    let empty_u8: &[u8] = &[];      // 1-byte aligned
    let empty_u32: &[u32] = &[];    // 4-byte aligned  
    let empty_u64: &[u64] = &[];    // 8-byte aligned
   
    println!("u8 address: {}", empty_u8.as_ptr() as usize);
    println!("u32 address: {}", empty_u32.as_ptr() as usize);
    println!("u64 address: {}", empty_u64.as_ptr() as usize);
}
$ cargo run --release --quiet
u8 address: 1
u32 address: 4
u64 address: 8

fmt is an optional reference to a slice of Placeholder items. For Option<&[T]>, Rust -often- uses null pointer optimization where None is represented by a null pointer. Therefore, the length field is irrelevant and is not populated in the current example.

print-type-size type: `std::option::Option<&[core::fmt::rt::Placeholder]>`: 16 bytes, alignment: 8 bytes
print-type-size     variant `Some`: 16 bytes
print-type-size         field `.0`: 16 bytes
print-type-size     variant `None`: 0 bytes

rust-gdb

We can verify the results of our static analysis using rust-gdb (or rust-lldb) which supports Rust types.

First we need to create a debug build where the function new_const constructing the Arguments struct is not optimized and inlined.

$ cargo rustc

Then we start a GDB server and connect to it with the rust-gdb client. We will examine the Arguments struct returned by new_const.

$ qemu-aarch64 -g 1234 target/aarch64-unknown-linux-musl/debug/rust-lab
$ rust-gdb -q -ex "target remote localhost:1234" target/aarch64-unknown-linux-musl/debug/rust-lab
Reading symbols from target/aarch64-unknown-linux-musl/debug/rust-lab...
Remote debugging using localhost:1234

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
0x00000000004019bc in _start ()
(gdb) b rust_lab::main
Breakpoint 1 at 0x401bd4: file src/main.rs, line 2.
(gdb) c
Continuing.

Breakpoint 1, rust_lab::main () at src/main.rs:2
2	    println!("Hello, world!");
(gdb) disas
Dump of assembler code for function _ZN8rust_lab4main17hb3ccde9ab543d852E:
   0x0000000000401bc0 <+0>:	sub	sp, sp, #0x50
   0x0000000000401bc4 <+4>:	stp	x29, x30, [sp, #64]
   0x0000000000401bc8 <+8>:	add	x29, sp, #0x40
   0x0000000000401bcc <+12>:	add	x8, sp, #0x10
   0x0000000000401bd0 <+16>:	str	x8, [sp, #8]
=> 0x0000000000401bd4 <+20>:	adrp	x0, 0x46d000
   0x0000000000401bd8 <+24>:	add	x0, x0, #0x710
   0x0000000000401bdc <+28>:	bl	0x401b54 <_ZN4core3fmt2rt38_$LT$impl$u20$core..fmt..Arguments$GT$9new_const17h2005e5bc47942c4fE>
   0x0000000000401be0 <+32>:	ldr	x0, [sp, #8]
   0x0000000000401be4 <+36>:	bl	0x41abf0 <_ZN3std2io5stdio6_print17h5a3b0843896b0124E>
   0x0000000000401be8 <+40>:	ldp	x29, x30, [sp, #64]
   0x0000000000401bec <+44>:	add	sp, sp, #0x50
   0x0000000000401bf0 <+48>:	ret
End of assembler dump.
(gdb) si 3
core::fmt::Arguments::new_const<1> (pieces=0x7f867d3fe830)
    at /home/gemesa/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:226
226	    pub const fn new_const<const N: usize>(pieces: &'a [&'static str; N]) -> Self {
(gdb) fin
Run till exit from #0  core::fmt::Arguments::new_const<1> (pieces=0x7f867d3fe830)
    at /home/gemesa/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:226
0x0000000000401be0 in rust_lab::main () at src/main.rs:2
2	    println!("Hello, world!");
Value returned is $1 = core::fmt::Arguments {pieces: &[&str](size=1) = {"Hello, world!\n"}, fmt: core::option::Option<&[core::fmt::rt::Placeholder]>::None, args: &[core::fmt::rt::Argument](size=0)}
(gdb)

The value returned matches the expected Arguments struct.

Vec

Source

Initialize a new workspace with cargo init.

fn main() -> std::process::ExitCode {
    let mut vec = vec![1, 2, 3];
    vec.push(4);
    let ret = vec.pop().unwrap();
    std::process::ExitCode::from(ret)
}

vec! is a macro which generates different code based on the passed argument. In our test code we pass a list of elements so the following pattern matches:

#![allow(unused)]
fn main() {
    ($($x:expr),+ $(,)?) => (
        <[_]>::into_vec($crate::boxed::box_new([$($x),+]))
    );
}
$ cargo rustc --release --quiet -- -Z unpretty=hir
#[prelude_import]
use std::prelude::rust_2024::*;
#[macro_use]
extern crate std;
fn main()
    ->
        std::process::ExitCode {
    let mut vec = <[_]>::into_vec(::alloc::boxed::box_new([1, 2, 3]));
    vec.push(4);
    let ret = vec.pop().unwrap();
    std::process::ExitCode::from(ret)
}

Build

$ cargo rustc --release

Ghidra

Load the binary into Ghidra and auto-analyze it.

rust_lab::main

The official docs explains Vec in detail. The most important parts for us:

The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated.

Vec is and always will be a (pointer, capacity, length) triplet.

The order of these fields is completely unspecified

If a Vec has allocated memory, then the memory it points to is on the heap

We can verify this:

$ cargo rustc --release --quiet -- -Zprint-type-sizes
...
print-type-size type: `std::vec::Vec<u8>`: 24 bytes, alignment: 8 bytes
print-type-size     field `.buf`: 16 bytes
print-type-size     field `.len`: 8 bytes
print-type-size type: `alloc::raw_vec::RawVec<u8>`: 16 bytes, alignment: 8 bytes
print-type-size     field `.inner`: 16 bytes
print-type-size     field `._marker`: 0 bytes
print-type-size type: `alloc::raw_vec::RawVecInner`: 16 bytes, alignment: 8 bytes
print-type-size     field `.cap`: 8 bytes
print-type-size     field `.ptr`: 8 bytes
print-type-size     field `.alloc`: 0 bytes
...

There are some abstractions involved, but ultimately the memory layout looks like this:

struct Vec {
    usize cap;
    ptr64 ptr;
    usize len;
};

Relevant types:

With this background information, we are ready to look at the decompiled code. Note that this is not the raw decompiled code. First we needed to create the Vec<u8> type in the Structure Editor and apply it in the code, then implement some further modifications, e.g., fix the prototype of __rust_alloc.

/* WARNING: Unknown calling convention: __rustcall */
/* rust_lab::main */

u8 __rustcall rust_lab::main(void)

{
  Vec<u8> vec;
  u8 ret;
  
  vec.ptr = __rustc::__rust_alloc(3,1);
  if (vec.ptr != (u8 *)0x0) {
    vec.ptr[2] = '\x03';
    vec.ptr[0] = '\x01';
    vec.ptr[1] = '\x02';
    vec.cap = 3;
    vec.len = 3;
                    /* try { // try from 00401958 to 00401967 has its CatchHandler @ 004019ac */
    alloc::raw_vec::RawVec<T,A>::grow_one(&vec,&PTR_s_src/main.rs_0046d860);
    vec.ptr[3] = 4;
    vec.len = 3;
    ret = vec.ptr[3];
    __rustc::__rust_dealloc(vec.ptr,vec.cap,1);
    return ret;
  }
                    /* WARNING: Subroutine does not return */
  alloc::alloc::handle_alloc_error(1,3);
}

The code is fairly self-explanatory at this point. But, for completeness, the logic is the following:

  • allocate a 3 byte section on the heap
  • initialize the heap data, capacity and length
  • increase the capacity (normally, length is increased as well, but in this case it is optimized out since we immediately pop the last element)
  • initialize the new heap data
  • save the last element in the return variable
  • deallocate the heap data
  • return the return variable

The only thing we might need to discuss is grow_one. If we check the implementation, we can see that it calls grow_amortized.

#![allow(unused)]
fn main() {
    fn grow_amortized(
        &mut self,
        len: usize,
        additional: usize,
        elem_layout: Layout,
    ) -> Result<(), TryReserveError> {
...
        let cap = cmp::max(self.cap.as_inner() * 2, required_cap);
        let cap = cmp::max(min_non_zero_cap(elem_layout.size()), cap);
...
}

By default, this function doubles the capacity. In our case, since the old capacity is 3, the new one would be 6, but since the Layout size is 1, it rounds up the capacity to 8.

#![allow(unused)]
fn main() {
// Tiny Vecs are dumb. Skip to:
// - 8 if the element size is 1, because any heap allocators is likely
//   to round up a request of less than 8 bytes to at least 8 bytes.
// ...
const fn min_non_zero_cap(size: usize) -> usize {
    if size == 1 {
        8
    } else if size <= 1024 {
        4
    } else {
        1
    }
}
}

Call graph of grow_one:

grow_one
    finish_grow
        __rust_realloc
        __rust_alloc

Raw decompiled code for reference:

/* WARNING: Unknown calling convention: __rustcall */
/* rust_lab::main */

undefined1 __rustcall rust_lab::main(void)

{
  undefined1 uVar1;
  undefined2 *puVar2;
  undefined8 local_38;
  undefined2 *local_30;
  undefined8 local_28;
  
  puVar2 = (undefined2 *)0x3;
  __rustc::__rust_alloc(3,1);
  if (puVar2 != (undefined2 *)0x0) {
    *(undefined1 *)(puVar2 + 1) = 3;
    *puVar2 = 0x201;
    local_38 = 3;
    local_28 = 3;
                    /* try { // try from 00401958 to 00401967 has its CatchHandler @ 004019ac */
    local_30 = puVar2;
    alloc::raw_vec::RawVec<T,A>::grow_one(&local_38,&PTR_s_src/main.rs_0046d860);
    *(undefined1 *)((long)local_30 + 3) = 4;
    local_28 = 3;
    uVar1 = *(undefined1 *)((long)local_30 + 3);
    __rustc::__rust_dealloc(local_30,local_38,1);
    return uVar1;
  }
                    /* WARNING: Subroutine does not return */
  alloc::alloc::handle_alloc_error(1,3);
}

Now that we see the big picture, we are ready to go through the listing. To easily match the listing with the decompiled code, it is annotated with pre-comments.

There is one additional piece of information we need to be familiar with to fully understand all of the instructions. __rust_no_alloc_shim_is_unstable is an internal variable, ldrb wzr,[x8]=>__rust_no_alloc_shim_is_unstable forces the linker to include the alloc shim. The tracking issue is here for anyone who might be interested.

Listing:

                             **************************************************************
                             * rust_lab::main                                             *
                             **************************************************************
                             undefined __rustcall main()
             undefined         <UNASSIGNED>   <RETURN>
             undefined8        Stack[-0x10]:8 local_10                                XREF[2]:     0040191c(W), 
                                                                                                   00401994(R)  
             undefined8        Stack[-0x20]:8 local_20                                XREF[2]:     00401918(W), 
                                                                                                   00401990(*)  
             Vec<u8>           Stack[-0x38]   vec                                     XREF[2,3]:   00401950(W), 
                                                                                                   0040197c(R), 
                                                                                                   00401968(R), 
                                                                                                   00401954(W), 
                                                                                                   00401980(W)  
             u8                HASH:5f59380   ret
                             _ZN8rust_lab4main17h4c095f7be815a79eE           XREF[4]:     main:004019f8(*), 004528cc, 
                             rust_lab::main                                               00453a8f(*), 00464cc0(*)  
        00401914 ff 03 01 d1     sub        sp,sp,#0x40
        00401918 fd 7b 02 a9     stp        x29,x30,[sp, #local_20]
        0040191c f3 1b 00 f9     str        x19,[sp, #local_10]
        00401920 fd 83 00 91     add        x29,sp,#0x20
        00401924 68 03 00 d0     adrp       x8,0x46f000
                             arg0: 3
        00401928 60 00 80 52     mov        w0,#0x3
                             arg1: 1
        0040192c 21 00 80 52     mov        w1,#0x1
        00401930 08 a1 47 f9     ldr        x8,[x8, #0xf40]=>->__rust_no_alloc_shim_is_uns   = 00470ae4
        00401934 73 00 80 52     mov        w19,#0x3
                             force the linker to include the alloc shim
        00401938 1f 01 40 39     ldrb       wzr,[x8]=>__rust_no_alloc_shim_is_unstable       = ??
        0040193c 34 00 00 94     bl         __rustc::__rust_alloc                            u8 * __rust_alloc(usize size, us
        00401940 00 03 00 b4     cbz        x0,LAB_004019a0
        00401944 28 40 80 52     mov        w8,#0x201
                             heap_data[2] = 3
        00401948 13 08 00 39     strb       w19,[x0, #0x2]
                             heap_data[0] = 1
                             heap_data[1] = 2
        0040194c 08 00 00 79     strh       w8,[x0]
                             init vec.cap and vec.ptr
        00401950 f3 83 00 a9     stp        x19,x0,[sp, #vec.cap]
                             init vec.len
        00401954 f3 0f 00 f9     str        x19,[sp, #vec.len]
                             try { // try from 00401958 to 00401967 has its CatchHandler @
                             LAB_00401958                                    XREF[1]:     00453a88(*)  
        00401958 61 03 00 90     adrp       x1,0x46d000
                             arg1: source location for debugging
        0040195c 21 80 21 91     add        x1=>PTR_s_src/main.rs_0046d860,x1,#0x860         = 0044aea0
                             arg0: address of vec
        00401960 e0 23 00 91     add        x0,sp,#0x8
                             increase cap from 3 to 8
        00401964 fd 0d 01 94     bl         alloc::raw_vec::RawVec<T,A>::grow_one            undefined grow_one()
                             } // end try from 00401958 to 00401967
                             LAB_00401968                                    XREF[1]:     00453a8d(*)  
        00401968 e8 0b 40 f9     ldr        x8,[sp, #vec.ptr]
        0040196c 89 00 80 52     mov        w9,#0x4
                             arg2: 1
        00401970 22 00 80 52     mov        w2,#0x1
                             vec.ptr[3] = 4
        00401974 09 0d 00 39     strb       w9,[x8, #0x3]
        00401978 68 00 80 52     mov        w8,#0x3
                             arg0: vec.ptr
                             arg1: vec.cap
        0040197c e1 83 40 a9     ldp        x1,x0,[sp, #vec.cap]
                             vec.len = 3
        00401980 e8 0f 00 f9     str        x8,[sp, #vec.len]
                             save return value
        00401984 13 0c 40 39     ldrb       w19,[x0, #0x3]
        00401988 22 00 00 94     bl         __rustc::__rust_dealloc                          void __rust_dealloc(u8 * ptr, us
                             return value
        0040198c e0 03 13 2a     mov        w0,w19
        00401990 fd 7b 42 a9     ldp        x29=>local_20,x30,[sp, #0x20]
        00401994 f3 1b 40 f9     ldr        x19,[sp, #local_10]
        00401998 ff 03 01 91     add        sp,sp,#0x40
        0040199c c0 03 5f d6     ret
                             LAB_004019a0                                    XREF[1]:     00401940(j)  
        004019a0 20 00 80 52     mov        w0,#0x1
        004019a4 61 00 80 52     mov        w1,#0x3
        004019a8 0f fe ff 97     bl         alloc::alloc::handle_alloc_error                 undefined handle_alloc_error()
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)

rust-lldb

We can verify the results of our static analysis using rust-lldb.

Start a GDB server and connect to it with the rust-lldb client. We will examine our Vec struct just before deallocation.

$ qemu-aarch64 -g 1234 target/aarch64-unknown-linux-musl/release/rust-lab
$ rust-lldb --source-quietly -o "gdb-remote localhost:1234" target/aarch64-unknown-linux-musl/release/rust-lab
Current executable set to '/home/gemesa/git-repos/rust-lab/target/aarch64-unknown-linux-musl/release/rust-lab' (aarch64).
Process 1447629 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0x00000000004017d4 rust-lab`_start
rust-lab`_start:
->  0x4017d4 <+0>:  mov    x29, #0x0 ; =0 
    0x4017d8 <+4>:  mov    x30, #0x0 ; =0 
    0x4017dc <+8>:  mov    x0, sp
    0x4017e0 <+12>: adrp   x1, 0
(lldb) b _ZN8rust_lab4main17h4c095f7be815a79eE
Breakpoint 2: where = rust-lab`rust_lab::main::h4c095f7be815a79e, address = 0x0000000000401914
(lldb) c
Process 1447629 resuming
Process 1447629 stopped
* thread #1, stop reason = breakpoint 2.1
    frame #0: 0x0000000000401914 rust-lab`rust_lab::main::h4c095f7be815a79e
rust-lab`rust_lab::main::h4c095f7be815a79e:
->  0x401914 <+0>:  sub    sp, sp, #0x40
    0x401918 <+4>:  stp    x29, x30, [sp, #0x20]
    0x40191c <+8>:  str    x19, [sp, #0x30]
    0x401920 <+12>: add    x29, sp, #0x20
(lldb) disas
rust-lab`rust_lab::main::h4c095f7be815a79e:
->  0x401914 <+0>:   sub    sp, sp, #0x40
    0x401918 <+4>:   stp    x29, x30, [sp, #0x20]
    0x40191c <+8>:   str    x19, [sp, #0x30]
    0x401920 <+12>:  add    x29, sp, #0x20
    0x401924 <+16>:  adrp   x8, 110
    0x401928 <+20>:  mov    w0, #0x3 ; =3 
    0x40192c <+24>:  mov    w1, #0x1 ; =1 
    0x401930 <+28>:  ldr    x8, [x8, #0xf40]
    0x401934 <+32>:  mov    w19, #0x3 ; =3 
    0x401938 <+36>:  ldrb   wzr, [x8]
    0x40193c <+40>:  bl     0x401a0c       ; __rustc::__rust_alloc
    0x401940 <+44>:  cbz    x0, 0x4019a0 ; <+140>
    0x401944 <+48>:  mov    w8, #0x201 ; =513 
    0x401948 <+52>:  strb   w19, [x0, #0x2]
    0x40194c <+56>:  strh   w8, [x0]
    0x401950 <+60>:  stp    x19, x0, [sp, #0x8]
    0x401954 <+64>:  str    x19, [sp, #0x18]
    0x401958 <+68>:  adrp   x1, 108
    0x40195c <+72>:  add    x1, x1, #0x860
    0x401960 <+76>:  add    x0, sp, #0x8
    0x401964 <+80>:  bl     0x445158       ; alloc::raw_vec::RawVec$LT$T$C$A$GT$::grow_one::h19885d150c1bd8f5
    0x401968 <+84>:  ldr    x8, [sp, #0x10]
    0x40196c <+88>:  mov    w9, #0x4 ; =4 
    0x401970 <+92>:  mov    w2, #0x1 ; =1 
    0x401974 <+96>:  strb   w9, [x8, #0x3]
    0x401978 <+100>: mov    w8, #0x3 ; =3 
    0x40197c <+104>: ldp    x1, x0, [sp, #0x8]
    0x401980 <+108>: str    x8, [sp, #0x18]
    0x401984 <+112>: ldrb   w19, [x0, #0x3]
    0x401988 <+116>: bl     0x401a10       ; __rustc::__rust_dealloc
    0x40198c <+120>: mov    w0, w19
    0x401990 <+124>: ldp    x29, x30, [sp, #0x20]
    0x401994 <+128>: ldr    x19, [sp, #0x30]
    0x401998 <+132>: add    sp, sp, #0x40
    0x40199c <+136>: ret    
    0x4019a0 <+140>: mov    w0, #0x1 ; =1 
    0x4019a4 <+144>: mov    w1, #0x3 ; =3 
    0x4019a8 <+148>: bl     0x4011e4       ; alloc::alloc::handle_alloc_error::h3005aad4027c4877
    0x4019ac <+152>: ldr    x1, [sp, #0x8]
    0x4019b0 <+156>: mov    x19, x0
    0x4019b4 <+160>: cbz    x1, 0x4019c4 ; <+176>
    0x4019b8 <+164>: ldr    x0, [sp, #0x10]
    0x4019bc <+168>: mov    w2, #0x1 ; =1 
    0x4019c0 <+172>: bl     0x401a10       ; __rustc::__rust_dealloc
    0x4019c4 <+176>: mov    x0, x19
    0x4019c8 <+180>: bl     0x433688       ; _Unwind_Resume
(lldb) b *0x401988
Breakpoint 3: where = rust-lab`rust_lab::main::h4c095f7be815a79e + 116, address = 0x0000000000401988
(lldb) c
Process 1447629 resuming
Process 1447629 stopped
* thread #1, stop reason = breakpoint 3.1
    frame #0: 0x0000000000401988 rust-lab`rust_lab::main::h4c095f7be815a79e + 116
rust-lab`rust_lab::main::h4c095f7be815a79e:
->  0x401988 <+116>: bl     0x401a10       ; __rustc::__rust_dealloc
    0x40198c <+120>: mov    w0, w19
    0x401990 <+124>: ldp    x29, x30, [sp, #0x20]
    0x401994 <+128>: ldr    x19, [sp, #0x30]
(lldb) x/g $sp+8
0x7f7d0dd8d8c8: 0x0000000000000008
(lldb) x/g $sp+16
0x7f7d0dd8d8d0: 0x00007f7d0fc0d040
(lldb) x/g 0x00007f7d0fc0d040
0x7f7d0fc0d040: 0x0000000004030201
(lldb) x/g $sp+24
0x7f7d0dd8d8d8: 0x0000000000000003

The capacity is 8, the heap data is [1, 2, 3, 4] and the length is 3, as expected.

Option

Source

Initialize a new workspace with cargo init --lib.

#![allow(unused)]
fn main() {
#[unsafe(no_mangle)]
pub fn safe_divide(dividend: i32, divisor: i32) -> Option<i32> {
    if divisor == 0 {
        None
    } else {
        Some(dividend / divisor)
    }
}

#[unsafe(no_mangle)]
pub fn process_option(value: Option<i32>) -> i32 {
    match value {
        Some(x) => x * 2,
        None => 0,
    }
}

#[unsafe(no_mangle)]
pub fn process_str_option(value: Option<&str>) -> usize {
    match value {
        Some(s) => s.len(),
        None => 0,
    }
}

#[unsafe(no_mangle)]
pub fn process_box_option(value: Option<Box<i32>>) -> i32 {
    match value {
        Some(boxed) => *boxed,
        None => -1,
    }
}
}

The Option type is described in the official docs in detail.

In some cases, simple functions e.g. process_option might be inlined by the compiler. For this reason, these are not present in the .o file, only in .rmeta. The inlining will be done based on the information (e.g. function signatures, type information and encoded MIR) available in the .rmeta file. Information about the .rmeta file format can be found here. We want to see the generated code for our sample functions in the .o file, so this optimization is undesirable for us. A possible solution is to use #[unsafe(no_mangle)] which has 2 effects:

  • Do not mangle the symbol name.
  • Export this symbol. #[unsafe(no_mangle)] implies that the function is intended to be called from outside of the current compilation unit (e.g. from C code or another Rust crate with a different LTO context). For this reason, it will be present in the .o file.

Without #[unsafe(no_mangle)]:

$ llvm-objdump --syms target/aarch64-unknown-linux-musl/release/deps/*.o

target/aarch64-unknown-linux-musl/release/deps/rust_lab-35c360f17fe9ba7d.o:     file format elf64-littleaarch64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 rust_lab.e234cd7f6b439d7e-cgu.0
0000000000000000 l    d  .text._ZN8rust_lab11safe_divide17h459bac753259da22E    0000000000000000 .text._ZN8rust_lab11safe_divide17h459bac753259da22E
0000000000000000 l       .text._ZN8rust_lab11safe_divide17h459bac753259da22E    0000000000000000 $x
0000000000000000 l    d  .text._ZN8rust_lab18process_box_option17h2a7f2c1809e960d7E     0000000000000000 .text._ZN8rust_lab18process_box_option17h2a7f2c1809e960d7E
0000000000000000 l       .text._ZN8rust_lab18process_box_option17h2a7f2c1809e960d7E     0000000000000000 $x
0000000000000000 l    d  .rodata..Lalloc_f5ffd2fd1476bab43ad89fb40c72d0c5       0000000000000000 .rodata..Lalloc_f5ffd2fd1476bab43ad89fb40c72d0c5
0000000000000000 l       .rodata..Lalloc_f5ffd2fd1476bab43ad89fb40c72d0c5       0000000000000000 $d
0000000000000000 l    d  .data.rel.ro..Lalloc_0ea055d83440e297c58eb113a9bcb2e2  0000000000000000 .data.rel.ro..Lalloc_0ea055d83440e297c58eb113a9bcb2e2
0000000000000000 l       .data.rel.ro..Lalloc_0ea055d83440e297c58eb113a9bcb2e2  0000000000000000 $d
0000000000000000 l       .comment       0000000000000000 $d
0000000000000000 l       .eh_frame      0000000000000000 $d
0000000000000000 g     F .text._ZN8rust_lab11safe_divide17h459bac753259da22E    0000000000000040 _ZN8rust_lab11safe_divide17h459bac753259da22E
0000000000000000         *UND*  0000000000000000 _ZN4core9panicking11panic_const24panic_const_div_overflow17h2ce15414ba9ec1bdE
0000000000000000 g     F .text._ZN8rust_lab18process_box_option17h2a7f2c1809e960d7E     0000000000000044 _ZN8rust_lab18process_box_option17h2a7f2c1809e960d7E
0000000000000000         *UND*  0000000000000000 _RNvCsdk9DaPZnL1i_7___rustc14___rust_dealloc

With #[unsafe(no_mangle)]:

$ llvm-objdump --syms target/aarch64-unknown-linux-musl/release/deps/*.o

target/aarch64-unknown-linux-musl/release/deps/rust_lab-35c360f17fe9ba7d.o:     file format elf64-littleaarch64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 rust_lab.e234cd7f6b439d7e-cgu.0
0000000000000000 l    d  .text.safe_divide      0000000000000000 .text.safe_divide
0000000000000000 l       .text.safe_divide      0000000000000000 $x
0000000000000000 l    d  .text.process_option   0000000000000000 .text.process_option
0000000000000000 l       .text.process_option   0000000000000000 $x
0000000000000000 l    d  .text.process_str_option       0000000000000000 .text.process_str_option
0000000000000000 l       .text.process_str_option       0000000000000000 $x
0000000000000000 l    d  .text.process_box_option       0000000000000000 .text.process_box_option
0000000000000000 l       .text.process_box_option       0000000000000000 $x
0000000000000000 l    d  .rodata..Lalloc_f5ffd2fd1476bab43ad89fb40c72d0c5       0000000000000000 .rodata..Lalloc_f5ffd2fd1476bab43ad89fb40c72d0c5
0000000000000000 l       .rodata..Lalloc_f5ffd2fd1476bab43ad89fb40c72d0c5       0000000000000000 $d
0000000000000000 l    d  .data.rel.ro..Lalloc_0ea055d83440e297c58eb113a9bcb2e2  0000000000000000 .data.rel.ro..Lalloc_0ea055d83440e297c58eb113a9bcb2e2
0000000000000000 l       .data.rel.ro..Lalloc_0ea055d83440e297c58eb113a9bcb2e2  0000000000000000 $d
0000000000000000 l       .comment       0000000000000000 $d
0000000000000000 l       .eh_frame      0000000000000000 $d
0000000000000000 g     F .text.safe_divide      0000000000000040 safe_divide
0000000000000000         *UND*  0000000000000000 _ZN4core9panicking11panic_const24panic_const_div_overflow17h2ce15414ba9ec1bdE
0000000000000000 g     F .text.process_option   0000000000000010 process_option
0000000000000000 g     F .text.process_str_option       000000000000000c process_str_option
0000000000000000 g     F .text.process_box_option       0000000000000044 process_box_option
0000000000000000         *UND*  0000000000000000 _RNvCsdk9DaPZnL1i_7___rustc14___rust_dealloc

Build

$ cargo rustc --release -- --emit obj,mir,llvm-ir

Ghidra

Load the .o file (located at target/aarch64-unknown-linux-musl/release/deps/) into Ghidra and auto-analyze it.

Layout

Option is an Enum type which is conceptually a tagged union with a discriminant and data. However, Rust often applies discriminant elision. For common types like references and Box<T>, None is represented using invalid bit patterns (like null pointers, see chapter Null pointer optimization) rather than a separate discriminant field, making Option<T> the same size as T. In case of the None variant, the data value is undefined. We will see this in the generated code but this is documented as well. The exact memory layout is unspecified without explicit #[repr] attributes.

safe_divide

#![allow(unused)]
fn main() {
#[unsafe(no_mangle)]
pub fn safe_divide(dividend: i32, divisor: i32) -> Option<i32> {
    if divisor == 0 {
        None
    } else {
        Some(dividend / divisor)
    }
}
}

The generated assembly is straightforward, there is only one piece of background information we need to know to fully understand it. The compiler automatically generates a check which makes sure to panic if the result would overflow. In case of i32, there is only one such scenario: dividend is i32::MIN and divisor is -1. This can be seen in the MIR already:

fn safe_divide(_1: i32, _2: i32) -> Option<i32> {
    debug dividend => _1;
    debug divisor => _2;
    let mut _0: std::option::Option<i32>;
    let mut _3: bool;
    let mut _4: i32;
    let mut _5: bool;
    let mut _6: bool;
    let mut _7: bool;

    bb0: {
        _3 = Eq(copy _2, const 0_i32);
        switchInt(move _2) -> [0: bb1, otherwise: bb2];
    }

    bb1: {
        _0 = const Option::<i32>::None;
        goto -> bb5;
    }

    bb2: {
        StorageLive(_4);
        assert(!copy _3, "attempt to divide `{}` by zero", copy _1) -> [success: bb3, unwind continue];
    }

    bb3: {
        _5 = Eq(copy _2, const -1_i32);
        _6 = Eq(copy _1, const i32::MIN);
        _7 = BitAnd(move _5, move _6);
        assert(!move _7, "attempt to compute `{} / {}`, which would overflow", copy _1, copy _2) -> [success: bb4, unwind continue];
    }

    bb4: {
        _4 = Div(copy _1, copy _2);
        _0 = Option::<i32>::Some(move _4);
        StorageDead(_4);
        goto -> bb5;
    }

    bb5: {
        return;
    }
}

The result is returned using 2 registers: w0 stores the discriminant (None: 0, Some: 1) and w1 stores the data.

Listing:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined safe_divide()
             undefined         <UNASSIGNED>   <RETURN>
             undefined8        Stack[-0x10]:8 local_10                                XREF[1]:     0010002c(W)  
                             check if divisor is 0
                             safe_divide                                     XREF[4]:     Entry Point(*), 001000e4(*), 
                                                                                          _elfSectionHeaders::00000090(*), 
                                                                                          _elfSectionHeaders::000000d0(*)  
        00100000 21 01 00 34     cbz        w1,LAB_00100024
                             i32::MIN
        00100004 08 00 b0 52     mov        w8,#0x80000000
                             check if dividend is i32::MIN
        00100008 1f 00 08 6b     cmp        w0,w8
        0010000c 61 00 00 54     b.ne       LAB_00100018
                             check if divisor is -1
        00100010 3f 04 00 31     cmn        w1,#0x1
        00100014 c0 00 00 54     b.eq       LAB_0010002c
                             LAB_00100018                                    XREF[1]:     0010000c(j)  
        00100018 01 0c c1 1a     sdiv       w1,w0,w1
                             Some
        0010001c 20 00 80 52     mov        w0,#0x1
        00100020 c0 03 5f d6     ret
                             None
                             LAB_00100024                                    XREF[1]:     00100000(j)  
        00100024 e0 03 1f 2a     mov        w0,wzr
        00100028 c0 03 5f d6     ret
                             LAB_0010002c                                    XREF[1]:     00100014(j)  
        0010002c fd 7b bf a9     stp        x29,x30,[sp, #local_10]!
        00100030 fd 03 00 91     mov        x29,sp
        00100034 00 00 00 90     adrp       x0,0x100000
        00100038 00 c0 02 91     add        x0=>PTR_DAT_001000b0,x0,#0xb0                    = 001000a0
        0010003c f1 03 00 94     bl         <EXTERNAL>::core::panicking::panic_const::pani   undefined panic_const_div_overfl

process_option

#![allow(unused)]
fn main() {
#[unsafe(no_mangle)]
pub fn process_option(value: Option<i32>) -> i32 {
    match value {
        Some(x) => x * 2,
        None => 0,
    }
}
}

We can see the same pattern (w0: discriminant, w1: data) when processing an Option passed to our function.

Listing:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined process_option()
             undefined         <UNASSIGNED>   <RETURN>
                             multiply by 2
                             process_option                                  XREF[3]:     Entry Point(*), 00100100(*), 
                                                                                          _elfSectionHeaders::00000150(*)  
        00100040 28 78 1f 53     lsl        w8,w1,#0x1
                             check discriminant: Z flag = 1 if None, Z flag = 0 if Some
        00100044 1f 00 00 72     tst        w0,#0x1
                             if Z=0 (Some): return w8, if Z=1 (None): return wzr
        00100048 00 11 9f 1a     csel       w0,w8,wzr,ne
        0010004c c0 03 5f d6     ret

Null pointer optimization

There are some cases where the discriminant is omitted due to optimizations. The general rule is that null pointer optimization can be used for types that can never be null. Examples include:

  • Option<&str>
  • Option<Box<i32>>

By the safety guarantees of safe Rust, a &str always points to a valid location and a Box<T> always points to a valid heap allocation. This enables the compiler to use further optimizations, for example dropping the discriminant field and using a null value to represent the None variant.

While tracing the different compilation steps, we can see that the discriminant is present in the MIR but not in the LLVM IR. This means the null pointer optimization happens during lowering MIR to LLVM IR.

process_str_option

#![allow(unused)]
fn main() {
#[unsafe(no_mangle)]
pub fn process_str_option(value: Option<&str>) -> usize {
    match value {
        Some(s) => s.len(),
        None => 0,
    }
}
}

Looking at the MIR, we can see that it extracts and checks the discriminant:

...
        _2 = discriminant(_1);
        switchInt(move _2) -> [0: bb2, 1: bb3, otherwise: bb1];
...

In the LLVM IR this has been simplified and replaced with a null check. If the pointer is null, 0 is returned, if it is a valid value, the length of the referenced string is returned. (An &str consists of 2 values: a pointer and a length.)

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
define noundef i64 @process_str_option(ptr noalias noundef readonly align 1 %0, i64 %1) unnamed_addr #1 {
start:
  %.not = icmp eq ptr %0, null
  %. = select i1 %.not, i64 0, i64 %1
  ret i64 %.
}

Listing:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined process_str_option()
             undefined         <UNASSIGNED>   <RETURN>
                             process_str_option                              XREF[3]:     Entry Point(*), 00100114(*), 
                                                                                          _elfSectionHeaders::00000190(*)  
        00100050 1f 00 00 f1     cmp        x0,#0x0
        00100054 e0 03 81 9a     csel       x0,xzr,x1,eq
        00100058 c0 03 5f d6     ret

Full MIR for reference:

fn process_str_option(_1: Option<&str>) -> usize {
    debug value => _1;
    let mut _0: usize;
    let mut _2: isize;
    let _3: &str;
    scope 1 {
        debug s => _3;
        scope 2 (inlined core::str::<impl str>::len) {
            let _4: &[u8];
            scope 3 (inlined core::str::<impl str>::as_bytes) {
            }
        }
    }

    bb0: {
        _2 = discriminant(_1);
        switchInt(move _2) -> [0: bb2, 1: bb3, otherwise: bb1];
    }

    bb1: {
        unreachable;
    }

    bb2: {
        _0 = const 0_usize;
        goto -> bb4;
    }

    bb3: {
        _3 = copy ((_1 as Some).0: &str);
        StorageLive(_4);
        _4 = copy _3 as &[u8] (Transmute);
        _0 = PtrMetadata(copy _4);
        StorageDead(_4);
        goto -> bb4;
    }

    bb4: {
        return;
    }
}

process_box_option

#![allow(unused)]
fn main() {
#[unsafe(no_mangle)]
pub fn process_box_option(value: Option<Box<i32>>) -> i32 {
    match value {
        Some(boxed) => *boxed,
        None => -1,
    }
}

}

A Box<i32> consists of a single pointer pointing to a heap allocated block and can never be null in safe code. Therefore, the code can be optimized with a null check. If it is null, -1 is returned. Otherwise, the pointer is dereferenced, the heap block is deallocated and the value is returned.

Listing:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined process_box_option()
             undefined         <UNASSIGNED>   <RETURN>
             undefined8        Stack[-0x10]:8 local_10                                XREF[3]:     00100060(W), 
                                                                                                   00100080(R), 
                                                                                                   00100094(R)  
             undefined8        Stack[-0x20]:8 local_20                                XREF[3]:     0010005c(W), 
                                                                                                   00100084(*), 
                                                                                                   00100098(*)  
                             process_box_option                              XREF[3]:     Entry Point(*), 00100128(*), 
                                                                                          _elfSectionHeaders::000001d0(*)  
        0010005c fd 7b be a9     stp        x29,x30,[sp, #local_20]!
        00100060 f3 0b 00 f9     str        x19,[sp, #local_10]
        00100064 fd 03 00 91     mov        x29,sp
                             null check
        00100068 20 01 00 b4     cbz        x0,LAB_0010008c
                             dereference
        0010006c 13 00 40 b9     ldr        w19,[x0]
        00100070 81 00 80 52     mov        w1,#0x4
        00100074 82 00 80 52     mov        w2,#0x4
        00100078 e4 03 00 94     bl         <EXTERNAL>::__rustc[a3537046f032bc96]::__rust_   undefined __rust_dealloc()
        0010007c e0 03 13 2a     mov        w0,w19
        00100080 f3 0b 40 f9     ldr        x19,[sp, #local_10]
        00100084 fd 7b c2 a8     ldp        x29=>local_20,x30,[sp], #0x20
        00100088 c0 03 5f d6     ret
                             LAB_0010008c                                    XREF[1]:     00100068(j)  
        0010008c 13 00 80 12     mov        w19,#0xffffffff
        00100090 e0 03 13 2a     mov        w0,w19
        00100094 f3 0b 40 f9     ldr        x19,[sp, #local_10]
        00100098 fd 7b c2 a8     ldp        x29=>local_20,x30,[sp], #0x20
        0010009c c0 03 5f d6     ret

Startup

This section shows how to locate the user-defined main function and trace the call chain that leads to its execution.

Source

Initialize a new workspace with cargo init.

The source code of chapter Hello, world! is reused here.

Build

$ cargo rustc --release

Ghidra

Load the binary into Ghidra and auto-analyze it.

Locating main

In an std environment (as opposed to no_std), the user-defined main function (here rust_lab::main) is called by lang_start_internal.

Call graph:

_start
    _start_c
        __libc_start_main
            main
                lang_start_internal
                    rust_lab::main

Decompiled code:

void main(int param_1,undefined8 param_2)

{
  code *pcStack_8;
  
  pcStack_8 = rust_lab::main;
  std::rt::lang_start_internal(&pcStack_8,&DAT_0046d6e8,(long)param_1,param_2,0);
  return;
}

lang_start_internal can be easily recognized, even if symbols are stripped. The first parameter is the rust_lab::main function being passed.

Listing:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined main()
             undefined         <UNASSIGNED>   <RETURN>
             undefined8        Stack[-0x10]:8 local_10                                XREF[1]:     00401b38(W)  
                             main                                            XREF[5]:     Entry Point(*), 
                                                                                          _start_c:004019f4(*), 00453c74, 
                                                                                          004648c4(*), 0046ff90(*)  
        00401b28 08 00 00 90     adrp       x8,0x401000
                             adrp + add loads the address of rust_lab::main
        00401b2c 08 c1 2b 91     add        x8,x8,#0xaf0
                             arg3: argv
        00401b30 e3 03 01 aa     mov        x3,x1
                             arg2: argc
        00401b34 02 7c 40 93     sxtw       x2,w0
                             store link register and address of rust_lab::main on the stack
        00401b38 fe 23 bf a9     stp        x30,x8=>rust_lab::main,[sp, #local_10]!
        00401b3c 61 03 00 90     adrp       x1,0x46d000
                             arg1: vtable pointer of the trait object
        00401b40 21 a0 1b 91     add        x1=>DAT_0046d6e8,x1,#0x6e8
                             arg0: data pointer of the trait object
        00401b44 e0 23 00 91     add        x0,sp,#0x8
                             arg4: 0
        00401b48 e4 03 1f 2a     mov        w4,wzr
        00401b4c ad 5b 00 94     bl         std::rt::lang_start_internal                     undefined lang_start_internal()
        00401b50 fe 07 41 f8     ldr        x30,[sp], #0x10
        00401b54 c0 03 5f d6     ret

Understanding the lang_start_internal arguments

If we look at the signature of lang_start_internal, we can see that it accepts 4 arguments, but the decompiled code above shows 5.

#![allow(unused)]
fn main() {
// To reduce the generated code of the new `lang_start`, this function is doing
// the real work.
#[cfg(not(test))]
fn lang_start_internal(
    main: &(dyn Fn() -> i32 + Sync + crate::panic::RefUnwindSafe),
    argc: isize,
    argv: *const *const u8,
    sigpipe: u8,
) -> isize {
...
}

This is because the first argument is a closure that is converted to a trait object when lang_start calls lang_start_internal.

#![allow(unused)]
fn main() {
#[cfg(not(any(test, doctest)))]
#[lang = "start"]
fn lang_start<T: crate::process::Termination + 'static>(
    main: fn() -> T,
    argc: isize,
    argv: *const *const u8,
    sigpipe: u8,
) -> isize {
    lang_start_internal(
        &move || crate::sys::backtrace::__rust_begin_short_backtrace(main).report().to_i32(),
        argc,
        argv,
        sigpipe,
    )
}
}

Within the closure body, __rust_begin_short_backtrace is called, which then calls rust_lab::main.

#![allow(unused)]
fn main() {
/// Fixed frame used to clean the backtrace with `RUST_BACKTRACE=1`. Note that
/// this is only inline(never) when backtraces in std are enabled, otherwise
/// it's fine to optimize away.
#[cfg_attr(feature = "backtrace", inline(never))]
pub fn __rust_begin_short_backtrace<F, T>(f: F) -> T
where
    F: FnOnce() -> T,
{
    let result = f();

    // prevent this frame from being tail-call optimised away
    crate::hint::black_box(());

    result
}
}

Trait objects are represented by a data pointer (here: address of rust_lab::main) and a vtable pointer (here: DAT_0046d6e8).

                             DAT_0046d6e8                                    XREF[1]:     main:00401b40(*)  
        0046d6e8 00              undefined1 00h
        0046d6e9 00              ??         00h
        ...
        0046d700 d8 1a 40        addr       core::ops::function::FnOnce::call_once{{vtable
                 00 00 00 
                 00 00
        0046d708 b0 1a 40        addr       std::rt::lang_start::_{{closure}}
                 00 00 00 
                 00 00
        0046d710 b0 1a 40        addr       std::rt::lang_start::_{{closure}}
                 00 00 00 
                 00 00

If we look at the disassembly of lang_start_internal, we can see which vtable entry it uses to execute rust_lab::main:


/* WARNING: Globals starting with '_' overlap smaller symbols at the same address */
/* WARNING: Unknown calling convention: __rustcall */
/* std::rt::lang_start_internal */

long __rustcall
std::rt::lang_start_internal
          (undefined8 param_1,long param_2,undefined8 param_3,undefined8 param_4,byte param_5)

{
...
  (**(code **)(param_2 + 0x28))(param_1);
...

Using this offset and the vtable address, we can calculate the address of the vtable entry which contains the address of std::rt::lang_start::_{{closure}}:

0x0046d6e8 + 0x28 = 0x0046d710
/* WARNING: Unknown calling convention: __rustcall */
/* std::rt::lang_start::_{{closure}} */

undefined8 __rustcall std::rt::lang_start::_{{closure}}(undefined8 *param_1)

{
  sys::backtrace::__rust_begin_short_backtrace(*param_1);
  return 0;
}

As we also saw earlier, __rust_begin_short_backtrace calls rust_lab::main in the end.

/* WARNING: Unknown calling convention: __rustcall */
/* std::sys::backtrace::__rust_begin_short_backtrace */

void __rustcall std::sys::backtrace::__rust_begin_short_backtrace(code *param_1)

{
  (*param_1)();
  return;
}

Panic: unwind vs abort

When a Rust program panics, it can handle the failure in two ways: unwind or abort. Unwind mode cleans up resources as the panic travels up the call stack, while abort mode immediately terminates the program. The chosen mode affects the generated code.

Source

Initialize a new workspace with cargo init.

The source code of chapter Vec is reused here.

Build

$ cargo rustc --release

By default the option panic=unwind is used.

Ghidra

Load the binary into Ghidra and auto-analyze it.

rust_lab::main

While scrolling through the listing, we can see that Ghidra adds try-catch comments to the following parts, and we notice XREFs from addresses that are far away.

                             try { // try from 00401958 to 00401967 has its CatchHandler @
                             LAB_00401958                                    XREF[1]:     00453a88(*)  
        00401958 61 03 00 90     adrp       x1,0x46d000
                             arg1: source location for debugging
        0040195c 21 80 21 91     add        x1=>PTR_s_src/main.rs_0046d860,x1,#0x860         = 0044aea0
                             arg0: address of vec
        00401960 e0 23 00 91     add        x0,sp,#0x8
                             increase cap from 3 to 8
        00401964 fd 0d 01 94     bl         alloc::raw_vec::RawVec<T,A>::grow_one            undefined grow_one()
                             } // end try from 00401958 to 00401967
                             catch() { ... } // from try @ 00401958 with catch @ 004019ac
                             LAB_004019ac                                    XREF[1]:     00453a8a(*)  
        004019ac e1 07 40 f9     ldr        x1,[sp, #0x8]
        004019b0 f3 03 00 aa     mov        x19,x0
        004019b4 81 00 00 b4     cbz        x1,LAB_004019c4
        004019b8 e0 0b 40 f9     ldr        x0,[sp, #0x10]
        004019bc 22 00 80 52     mov        w2,#0x1
        004019c0 14 00 00 94     bl         __rustc::__rust_dealloc                          void __rust_dealloc(u8 * ptr, us
                             LAB_004019c4                                    XREF[1]:     004019b4(j)  
        004019c4 e0 03 13 aa     mov        x0,x19
        004019c8 30 c7 00 94     bl         _Unwind_Resume                                   undefined _Unwind_Resume()
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)

If we follow the XREFs, we can see they are under the LSDA (Language-Specific Data Area) located in section .gcc_except_table.

                             //
                             // .gcc_except_table 
                             // SHT_PROGBITS  [0x453a84 - 0x454c67]
                             // ram:00453a84-ram:00454c67
                             //
                             **************************************************************
                             * Language-Specific Data Area                                *
                             **************************************************************
...
        00453a88 44              uleb128    LAB_00401958                                     (LSDA Call Site) IP Offset
        00453a89 10              uleb128    10h                                              (LSDA Call Site) IP Range Length
        00453a8a 98 01           uleb128    LAB_004019ac                                     (LSDA Call Site) Landing Pad Add
        00453a8c 00              uleb128    0h                                               (LSDA Call Site) Action Table Of
...

This means, if a panic occurs while the execution is between 0x00401958 and 0x00401958 + 0x10 = 0x00401968, during unwinding, the code located at 0x004019ac will be executed. In this case, if necessary (the vector capacity is not null) it deallocates the allocated block and continues the unwinding process.