$ Rust tutorials

Chapter 06

Getting some pointers

Rust is quite particular when it comes to pointers: it keeps a high-performance scheme while ensuring safety use patterns (no dangling pointers, etc.).

We will also talk about ownership and lifetimes in this chapter.

Introduction

First, let’s look for a second at how things are laid out in memory. There are two important memory pools, the stack and the heap.

Simply put

Functions are individual, reusable pieces of code. If we want to directly edit a variable from the originating (caller) namespace, we need to pass its memory address, i.e. a pointer to the said variable so that the function knows where to find it and access it.

Also, when passing huge structures of data you might want to pass a pointer to it to save on memory copies but that is a quite sparse use-case as we will see.

On stack, and heap

Each application uses a pool of memory called the stack, of fixed-size, allocated when a task starts up. There are various mechanisms occuring inside the stack but it is mostly hosting all local variables and parameters used along the execution. When a function is called, the existing registers are saved on memory and the program jumps to the function and creates a new stack frame for it. The stack also traces the order in which functions are called so that function returns occur correctly. The default stack size of a Rust task is 2 MiB.
So, we need to pass a memory address in order to be able to edit a variable from the caller’s namespace – that is the main reason why we use pointers.

A program can also dynamically request memory with the unused memory in the computer, managed by the OS: this is called the heap. All dynamically-sized types (DST) are stored on the heap with an OS allocation.
The heap can only be accessed via a pointer located on the stack since it is not the default memory pool, these pointers create a box.

So, an int has a fixed-size in memory and is stack-allocated. A dynamic-array ~[], a string ~str or anything ~ is allocated on the heap (that is where you would use malloc in C or new in C++).

Different kinds

Rust has two primary pointer types: the borrowed reference (using the & symbol) and the owned pointer (indicated by ~).

Referencing

Referencing is also called borrowing because creating a reference freezes the target variable, i.e. it cannot be freed nor transfered (so no use-after-free). When the reference gets out of scope and freed, it is available again to the caller. References use the & operator, just like C pointers; or &mut with mutability, which need to point to uniquely referenced, mutable data – everything is statically checked.

This would be the most basic example:

let x = 3;
{
    y = &x; // `x` is now frozen and cannot be modified
    // ...but it can be read since the borrow is immutable
}
// `x` can be accessed again (`y` is out of scope)

A borrow cannot outlive the variable it originates from: it has a lifetime, meaning that you will have to change your allocation pattern if a reference outlives its lifetime, like here:

let mut x = &3;
{
    let mut y = 4;
    x = &y;
} // `y` is freed here, but `x` still lives...

This pattern will be rejected, since y has a shorter lifetime than x.
The compiler enforces valid references and yields:

error: borrowed value does not live long enough

Note: there are a few cases through like when returning a reference passed to a function where you will need to add a lifetime annotation, so that it is inferred from the caller; more on that a bit later.

Referencing is the default choice as a pointer: all checks are performed at compile-time (by the borrow checker) so its footprint is that of a C pointer (which is also available in Rust as unsafe, * pointer).
The unary star operator * also serves for dereferencing, like in C.

Unique ownership

An owned pointer owns a certain (dynamically allocated) part of the heap, i.e. the owner is the only one who can access the data – unless it transfers ownership, at which point the compiler will free the memory automatically (pointer is copied, but not the value).

fn take<T>(ptr: ~T) { // works for any type, `T`
    ...
}

let m = ~"Hello!";
take(m);
take_again(m); // ERROR: `m` is moved

Note that owned pointers can, like most objects, be borrowed. You can also copy a unique pointer using .clone(), but this is an expensive operation.

Shared pointers

“Shared” may evoke Garbage Collection to you. Well, this is partly the case.

Using only owned and borrowed pointers, ownership forms a DAG, while shared pointers allow multiple pointers to the same object and cycles. There are few of them, either GC-managed (with immutability to prevent data races), or Reference counted with specific types that allow thread-sharing or mutability, for instance. Note that Rust also has, of course, mutable shared pointer types.

Note: As you can see, these pointers serve a particular purpose. You should not have to look at them until they are needed in one of your programs.

First and foremost, please note that the new tracing, task-local Garbage Collection algorithm that will be introduced into the standard library is being worked on right now.
So, there used to be an @ managed pointer type, but it has been phased out in favor of std::rc::Rc and std::gc::Gc; which are standard library types. Right now, Gc is just a wrapper around Rc, which manually counts references, meaning less overhead than a GC algorithm (which periodically checks for pointer references) but it has a few limitations, e.g. it will leak memory on cycles (recursive data structures and the likes).

Note: task-local (or per-thread) GC is an important part of the deal because it means that you can have a task which handles low-latency jobs and is manually managed and another that can just run GC; task local also means that you can’t pass these pointers between tasks, which can be desired in some cases.

Okay, let’s have a look at some of these “smart pointers”:

  • First, if we want mutability inside of our Gc/Rc types, we will have to use Cell or RefCell, depending on whether the contained type is Pod or not (Pod is whatever type can be copied with a memcpy instruction, i.e. bool, integers, floats, char, &/* and all types that are construction of these all): Pod can use Cell, everything else will use RefCell.
    You can see Pod as every type that has a fixed size and where you do not need to dereference it in order to access the content (that includes struct of Pod types, tuples, etc. but not dynamic vectors for instance).
  • We have said that Gc uses immutable, thread-local data. If we want to share data across threads, we would have to use a variant of Rc: Arc, i.e. Atomically Reference Counted. As the name suggests, it will make RefCount an atomic (insecable) operation (using Fetch-and-add on modern processors) so that it avoids data races where threads would access/modify the count at the same time.
  • What if we want cross-thread mutability? Arc<Cell<>> is not allowed since it would break atomicity, so there is a special type for that: MutexArc, internally using mutexes to prevent data races.
    There is a variant called RWArc that uses a Readers–writer lock, making it more efficient in the case where you have lots of readers.

The Edge-cases

References and lifetimes

Okay, let’s use a silly example involving a function return:

fn take(x: &int) -> &int {
    x
}

fn main() {
    let x = 4;
    println!("{}", *take(&x));
}

You are probably thinking that x outlives its lifetime; that’s where it is:

error: cannot infer an appropriate lifetime due to conflicting requirements

It doesn’t end there through, since we can pass a lifetime parameter from the caller to the function:

fn take<'a>(ptr: &'a int) -> &'a int {
    ptr
}

fn main() {
    let x = 4;
    println!("{}", *take(&x));
}

As you can see here, we define 'a as a parameter (the single quotation mark prefix denoting a lifetime), and annotate it to both the value being passed and the return value. In short, the return value will inherit the lifetime of the parameter.
Since x is still alive until the end of main() – the caller function, this pattern is valid and typechecks.

So generally speaking, if you want to return a borrowed value (eventually with a condition evaluation for instance), you will have to use that.
This is particularly useful if you want to modify a variable in-place (that is, without having to pass it as heap pointer), in which case you can take a mutable borrow &mut with a lifetime annotation.

You can also annotate lifetime parameters to several variables, in which case the compiler will pick the lowest. This is useful when your output depends on a few variables:

fn max<'a>(x: &'a int, y: &'a int) -> &'a int {
    if (*x >= *y) {
        x
    } else {
        y
    }
}

Lastly, there is a 'static lifetime, which you want to use outside of any brace scope and does not expire. As an example, here is how rust defines its bug report URL string:

static BUG_REPORT_URL: &'static str = "...url...";