Rust Atomics and Locks

Last updated on February 27, 2025 pm

WIP, No ETA provided

TL;DR - Please refer to the ToC on the right side

The tremendous book
Rust Atomics and Locks by Mara Bos
in my POV.

Cell, RefCell

Interior mutability: a design pattern in Rust that allows you to mutate data even when there are immutable references to that data.

In short, you can use fn foo(&self) to change the data inside the struct. Normally, we have to use fn foo(&mut self).

Cell

  • take() out the data, modify it, and set() it back.

    Mutex is the concurrent version of Cell.

Refcell:

  • borrow_mut() will panic if there’s already a mutable borrow.

    RWMutex is the concurrent version of RefCell, but it will block until the mutable borrow is released instead of panicking.

Aside: UnsafeCell

Both of them are built with a core primitive called UnsafeCell.

This struct provides the basic abstraction for interior mutability, however, you have to use unsafe to access.

Cell, RefCell and all other types that allows internal mutability use UnsafeCell to wrap their data and provide safe apis.

MutexGuard lifetime

if list.lock().unwrap().pop() == Some(1) { // <--- drop here
    do_something();
}

// PERF: needlessly hold on to the lock while processing the item.
if let Some(item) = list.lock().unwrap().pop() {
    process_item(item);
} // <--- drop here

// FIXED:
let item = list.lock().unwrap().pop(); // <--- drop here
if let Some(item) = item {
    process_item(item);
}

Reason:

  • The basic if statement is always a simple boolean expression.
  • If we replace pop() with front(), things’re clear. That’s how borrow checker works.

Parking

Consider this situation: we only process items when the list is not empty.
If we use Mutex, then we have to keep calling lock() again and again, which is not efficient.

use std::collections::VecDeque;

fn main() {
    let queue = Mutex::new(VecDeque::new());

    thread::scope(|s| {
        // Consuming thread
        let t = s.spawn(|| loop {
            let item = queue.lock().unwrap().pop_front();
            if let Some(item) = item {
                dbg!(item);
            } else {
                thread::park();
            }
        });

        // Producing thread
        for i in 0.. {
            queue.lock().unwrap().push_back(i);
            t.thread().unpark();
            thread::sleep(Duration::from_secs(1));
        }
    });
}

A call to unpark() before the thread parks itself does not get lost

However, unpark requests don’t stack up.

Calling unpark() two times and then calling park() two times afterwards still results in the thread going to sleep.

Cond Vars

The example above is low-efficiency when we want to use more consumers.
Because the producer doesn’t know which consumer to wake up.

fn main() {
    use std::sync::Condvar;
    use std::{collections::VecDeque, sync::Mutex, thread, time::Duration};

    let queue = Mutex::new(VecDeque::new());
    let not_empty = Condvar::new();
    let empty = Condvar::new();

    thread::scope(|s| {
        // not_empty handler
        s.spawn(|| loop {
            let mut q = queue.lock().unwrap();
            let item = loop {
                if let Some(item) = q.pop_front() {
                    break item;
                } else {
                    empty.notify_one(); // wake up the empty handler(s)
                    q = not_empty.wait(q).unwrap();
                }
            };
            drop(q);
            println!("Got item: {}", item);
        });

        // empty handler
        s.spawn(|| loop {
            let mut q = queue.lock().unwrap();
            loop {
                if q.is_empty() {
                    break;
                } else {
                    q = empty.wait(q).unwrap();
                }
            }
            drop(q);
            println!("Queue is empty");
            thread::sleep(Duration::from_millis(400));
        });

        // producer
        for i in 0.. {
            queue.lock().unwrap().push_back(i);
            not_empty.notify_one(); // wake up the not_empty handler(s)
            thread::sleep(Duration::from_secs(1));
        }
    });
}

Memory Ordering

Happens-Before

Before we dive into the specific memory orderings below, we must have a deep understanding of happens-before relationship.

happens-before dose not mean that the first operation is literally executed before the second one. (If you want to achieve that, you should use Mutex, CondVar, Chan or other synchronization primitives to help threads communicate.)

There’re 2 points to remember:

  1. happens-before is all about the ‘instruction reordering’ by the compiler or the CPU.

  2. happens-before itself doesn’t happen until observable behavior occurs.

For example:


Thread A:                          Thread B:

some ops (A) ----------
...                   |
a.store(1, Release);  | visible
                      |
                      |     if a.load(Acquire) == 1 {
                      -> (B)    // do something
                            }

There’s no happens-before relationship until Thread B sees the value 1 in a.load(Acquire), and at that point, we can say that (A) happens before (B), or (A) is visible to (B).

Relaxed

While atomic operations using relaxed memory ordering do not provide any happens-before relationship,
they do guarantee a total modification order of each individual atomic variable.
This means that all modifications of the same atomic variable happen in an order that is the same from the perspective of every single thread.

Example:

static X: AtomicI32 = AtomicI32::new(0);

// thread 1
fn a() {
    X.fetch_add(5, Relaxed);
    X.fetch_add(10, Relaxed);
}

// thread 2
fn b() {
    let a = X.load(Relaxed);
    let b = X.load(Relaxed);
    let c = X.load(Relaxed);
    let d = X.load(Relaxed);
    println!("{a} {b} {c} {d}");
}

The output will never be 0 5 0 15 or 0 0 10 15 due to the order of the 2 fetch_add ops.

  • 0 5 0 15: since we’ve written 5 and it has been read, the next read mustn’t be 0.
  • 0 0 10 15: write 5 happens before write 10.

Release / Acquire

Release corresponds to the store operation, and Acquire corresponds to the load operation.

use std::sync::atomic::Ordering::{Acquire, Release};

static DATA: AtomicU64 = AtomicU64::new(0);
static READY: AtomicBool = AtomicBool::new(false);

fn main() {
    thread::spawn(|| {
        DATA.store(123, Relaxed);
        READY.store(true, Release); // Everything from before this store ..
    });
    while !READY.load(Acquire) { // .. is visible after this loads `true`.
        thread::sleep(Duration::from_millis(100));
        println!("waiting...");
    }
    println!("{}", DATA.load(Relaxed));
}

Note that if we use Relaxed instead of Acquire for READY, we might read 0 instead of 123.
That’s because Relaxed only provides a total order for the same atomic variable, not across different variables.

Since Release and Acquire have a happens-before relationship, we can simply use u64 instead of AtomicU64:

static mut DATA: u64 = 0;
static READY: AtomicBool = AtomicBool::new(false);

fn main() {
    thread::spawn(|| {
        // Safety: Nothing else is accessing DATA,
        // because we haven't set the READY flag yet.
        unsafe { DATA = 123 };
        READY.store(true, Release); // Everything from before this store ..
    });
    while !READY.load(Acquire) {
        // .. is visible after this loads `true`.
        thread::sleep(Duration::from_millis(100));
        println!("waiting...");
    }
    // Safety: Nothing is mutating DATA, because READY is set.
    println!("{}", unsafe { DATA });
}

SeqCst

To be short, the order of excution is just like the order of the codes.
Here’s a example of the difference between Acq/Rel and SeqCst:

static A: AtomicUsize = AtomicUsize::new(0);
static B: AtomicUsize = AtomicUsize::new(0);

fn thread1() {
    A.store(1, SeqCst);
    B.store(2, SeqCst);
}

fn thread2() {
    let b = B.load(SeqCst);
    let a = A.load(SeqCst);
    println!("a = {}, b = {}", a, b);
}

If b is 2 then a must be 1.

static A: AtomicUsize = AtomicUsize::new(0);
static B: AtomicUsize = AtomicUsize::new(0);

fn thread1() {
    A.store(1, Release);
    B.store(2, Relaxed);
}

fn thread2() {
    let b = B.load(Relaxed);
    let a = A.load(Acquire);
    println!("a = {}, b = {}", a, b);
}

If b is 2, we can’t guarantee that a is 1.


Fence

To be short, fence stops the compiler from reordering instructions following the fence with instructions preceding the fence.

More specifically, a fence(Release) will prevent the store instructions being reordered after the fence, and a fence(Acquire) will prevent the read instructions being reordered before the fence.

We’ve already known that Release and Acquire have a happens-before relationship, and we can use fence to substitute them like this:

before:

// Thread A
{
    DATA.store(42, Relaxed);
    READY.store(true, Release); // What happens-before the `Release`...
}

// Thread B
if READY.load(Acquire) {            // ... is visible after the `Acquire`.
    let value = DATA.load(Relaxed); // The observation has occured (READY false -> true), so we can see the value must be 42.
    assert_eq!(value, 42);
}

after:

// Thread A
{
    DATA.store(42, Relaxed);

    fence(Release);             // fence before store
    READY.store(true, Relaxed); // just Relaxed
}

// Thread B
if READY.load(Relaxed) { // just Relaxed
    fence(Acquire);      // fence after load

    let value = DATA.load(Relaxed);
    assert_eq!(value, 42);
}
  1. fence after load:

Let’s say, if we move the fence to the front of the if block, then the compiler might reorder the load before the if check:

// Thread B
fence(Acquire);
let value = DATA.load(Relaxed); // what value will be?
if READY.load(Relaxed) {
    println!("DATA = {}", value);
}

So we’ll probably get DATA = 0 instead of DATA = 42.

  1. fence before store
{
    DATA.store(42, Relaxed);
    fence(Release); // make sure DATA is stored
    READY.store(true, Relaxed);
}

Finally, check another example to make a deeper impression - use fence to get better performance:

let p = PTR.load(Acquire);
if p.is_null() {
    println!("no data");
} else {
    println!("data = {}", unsafe { *p });
}

let p = PTR.load(Relaxed);
if p.is_null() {
    println!("no data");
} else {
    fence(Acquire);
    println!("data = {}", unsafe { *p });
}
  • Why is the performance better?
    Because the fence is only needed when p is not null, and load by Relaxed is lighter weight than Acquire.

  • How does it work?
    Because fence stops *p from being reordered before the if check, and we can avoid the Acquire load when p is null.

Any way, if you find it hard to understand, keeping these 2 rules in mind might help:

a.store(1, Release);
// equals to
fence(Release);
a.store(1, Relaxed);
let a = b.load(Acquire);
// equals to
let a = b.load(Relaxed);
fence(Acquire);

If you simply do ‘Variable Replacement’, you’ll see that a fence isn’t actually dependent on another fence, it can also work with a store or a load.

i.e. A fence(Acquire) doesn’t have to work with a fence(Release), we don’t have to use fence in pairs.

And that’s how the last example works.

PS: The ‘replacement’ doesn’t strictly requires you to do something like :s/ori/new. Of course you can add some lines between fence and store / load, as long as the codes you add don’t care about the mem order.


Misconceptions

  1. I need strong memory ordering to make sure changes are immediately visible to other threads.
  • In real life, memory ordering is about things like reordering instructions, which usually happen at nanosecond scale.
    Stronger memory ordering does not make your data travel faster; it might even slow your program down.
  1. Disabling optimization means I don’t need to care about memory ordering.
  • Processor optimizations still happen even if you disable compiler optimizations.
  1. Sequentially consistent memory ordering can be used for a “release-load” or an “acquire-store.”
  • Release-store does not form any release-acquire relationship with a SeqCst-store.
    If you need them to be part of a globally consistent order, both operations will have to use SeqCst.

Summary

type guarantee
Relaxed Total modification order of a specific single atomic variable
Acq/Rel Make sure changes before “Rel” are visible to “Acq”
SeqCst Just like the order of the codes look like
Fence Codes after the fence are not reordered with the codes before the fence

Aside: compare_exchange vs. compare_exchange_weak

To be short, compare_exchange_weak might return Err even if the comparison is successful.

This is because the low-level instructions of _weak is LL/SC (Load-Linked/Store-Conditional), which might fail due to cache contention.

Since we always need to use while loop to retry, compare_exchange_weak is more efficient on some specific platforms, such as ARM.

What’s more, since the x86/x64 arch provides a strong memory named TSO (Total Store Order), thus store(Release) or load(Acquire) might simply be compiled into a MOV instruction;

However, on ARM, the memory model is less strict, so the compiler might generate a DMB (Data Memory Barrier) instruction for Release or Acquire.

Spin Lock

If a lock is only ever held for very brief moments and the threads locking it can run in parallel on different processor cores,
it might be better for the threads to repeatedly try to lock it without actually going to sleep.

Minimal Impl

impl SpinLock {
    pub const fn new() -> Self {
        Self { locked: AtomicBool::new(false) }
    }

    pub fn lock(&self) {
        while self.locked.swap(true, Acquire) {
            std::hint::spin_loop();
        }
    }

    pub fn unlock(&self) {
        self.locked.store(false, Release);
    }
}

Hold value

pub struct SpinLock<T> {
    locked: AtomicBool,
    value: UnsafeCell<T>,
}

impl<T> SpinLock<T> {
    pub fn new(value: T) -> Self {
        Self {
            locked: AtomicBool::new(false),
            value: UnsafeCell::new(value),
        }
    }

    // use this macro to suppress the clippy error
    #[allow(clippy::mut_from_ref)]
    pub fn lock(&self) -> &mut T {
        while self.locked.swap(true, Acquire) {
            std::hint::spin_loop();
        }
        unsafe { &mut *self.value.get() }
    }

    pub fn unlock(&self) {
        self.locked.store(false, Release);
    }
}

Use Guard to auto unlock

pub struct Guard<'a, T> {
    lock: &'a SpinLock<T>,
}

impl<T> Deref for Guard<'_, T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        unsafe { &*self.lock.value }
    }
}

impl<T> DerefMut for Guard<'_, T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        unsafe { &mut *self.lock.value }
    }
}

impl<T> Drop for Guard<'_, T> {
    fn drop(&mut self) {
        self.lock.unlock();
    }
}

Now let’s change the lock method to return a Guard:

impl<T> SpinLock<T> {
    pub fn lock(&self) -> Guard<T> {
        while self.locked.swap(true, Acquire) {
            std::hint::spin_loop();
        }
        Guard { lock: self }
    }
}

Fence it!

Remember we’ve talked about using fence to substitute load or store in the previous section Fence? There happens to be an example of building a SpinLock with fence in the rust doc of fence itself:

pub struct Mutex {
    flag: AtomicBool,
}

impl Mutex {
    pub fn new() -> Mutex {
        Mutex {
            flag: AtomicBool::new(false),
        }
    }

    pub fn lock(&self) {
        // Wait until the old value is `false`.
        while self
            .flag
            .compare_exchange_weak(false, true, Ordering::Relaxed, Ordering::Relaxed)
            .is_err()
        {}
        // This fence synchronizes-with store in `unlock`.
        fence(Ordering::Acquire);
    }

    pub fn unlock(&self) {
        self.flag.store(false, Ordering::Release);
    }
}

A little bit confused of the single fence? Let’s break it down:

First, to see things clearly, we can cast the while loop to a loop:

pub fn lock(&self) {
    // Wait until the old value is `false`.
    loop {
        if self
            .flag
            .compare_exchange_weak(false, true, Ordering::Relaxed, Ordering::Relaxed)
            .is_ok()
        {
            break;
        }
    }
    // This fence synchronizes-with store in `unlock`.
    fence(Ordering::Acquire);
}

Then, we replace compare_exchange_weak with load and store, note that this is impossible in real code because split cas into individual load and store will break the atomicity:

pub fn lock(&self) {
    // Wait until the old value is `false`.
    loop {
        if self.flag.load(Ordering::Relaxed) == false {
            self.flag.store(true, Ordering::Relaxed);
            break;
        }
    }
    // This fence synchronizes-with store in `unlock`.
    fence(Ordering::Acquire);
}

Do a little bit adjustment:

pub fn lock(&self) {
    // Wait until the old value is `false`.
    loop {
        if self.flag.load(Ordering::Relaxed) == false {
            fence(Ordering::Acquire);
            self.flag.store(true, Ordering::Relaxed);
            break;
        }
    }
}

Now we can see that fence after load pattern! It’s equivalent to:

pub fn lock(&self) {
    // Wait until the old value is `false`.
    loop {
        if self.flag.load(Ordering::Acquire) == false {
            self.flag.store(true, Ordering::Relaxed);
            break;
        }
    }
}

And that’s exactly how does the fence synchronizes-with the store in unlock.

It’s a little bit verbose, but it’s a good way to understand how fence works.

Chan

pub struct Channel<T> {
    message: UnsafeCell<MaybeUninit<T>>,
    ready: AtomicBool,
}

unsafe impl<T> Sync for Channel<T> where T: Send {}

pub struct Sender<'a, T> {
    channel: &'a Channel<T>,
    recv_thread: Thread,
}

pub struct Receiver<'a, T> {
    channel: &'a Channel<T>,
    _marker: std::marker::PhantomData<&'a T>,
}

impl<T> Channel<T> {
    pub const fn new() -> Self {
        Self {
            message: UnsafeCell::new(MaybeUninit::uninit()),
            ready: AtomicBool::new(false),
        }
    }

    pub fn split(&mut self) -> (Sender<T>, Receiver<T>) {
        *self = Self::new();
        (
            Sender {
                channel: self,
                recv_thread: thread::current(),
            },
            Receiver {
                channel: self,
                _marker: std::marker::PhantomData,
            },
        )
    }
}

impl<T> Sender<'_, T> {
    pub fn send(self, message: T) {
        unsafe { (*self.channel.message.get()).write(message) };
        self.channel.ready.store(true, Release);
        self.recv_thread.unpark();
    }
}

impl<T> Receiver<'_, T> {
    pub fn is_ready(&self) -> bool {
        self.channel.ready.load(Relaxed)
    }

    pub fn receive(self) -> T {
        while !self.channel.ready.swap(false, Acquire) {
            thread::park();
        }
        unsafe { (*self.channel.message.get()).assume_init_read() }
    }
}

impl<T> Drop for Channel<T> {
    fn drop(&mut self) {
        if *self.ready.get_mut() {
            unsafe { self.message.get_mut().assume_init_drop() }
        }
    }
}

Pay attention to these points:

  1. Use MaybeUninit to save memory.
  2. Abstract Sender and Receiver to limit the user’s access to the channel.(Think about the MutexGuard in Mutex)
  3. Use thread handle to park / unpark the receiver thread.
  4. Do not forget to drop the MaybeUninit due to the struct won’t drop the inner data automatically.

Arc

Minimal Impl

  • snippet 1:

    struct ArcData<T> {
        ref_cnt: AtomicUsize,
        data: T,
    }
    
    pub struct Arc<T> {
        ptr: NonNull<ArcData<T>>,
    }
    
    unsafe impl<T: Send + Sync> Send for Arc<T> {}
    unsafe impl<T: Send + Sync> Sync for Arc<T> {}
    • NonNull is almost same as *mut T but non-null and [covariant]
    • Arc<T> should be shared between threads, so T should be Sync.
    • Arc<T> could be dropped by other threads, so T should be Send.
  • snippet 2:

    impl<T> Arc<T> {
        pub fn new(data: T) -> Arc<T> {
            Arc {
                ptr: NonNull::from(Box::leak(Box::new(ArcData {
                    ref_count: AtomicUsize::new(1),
                    data,
                }))),
            }
        }
    
        fn data(&self) -> &ArcData<T> {
            unsafe { self.ptr.as_ref() }
        }
    }
    
    impl<T> Deref for Arc<T> {
        type Target = T;
    
        fn deref(&self) -> &T {
            &self.data().data
        }
    }
    • Use Box::leak to give up the exclusive ownership of a Box allocation.
  • snippet 3:

    impl<T> Clone for Arc<T> {
        fn clone(&self) -> Self {
            if self.data().ref_count.fetch_add(1, Relaxed) > usize::MAX / 2 {
                std::process::abort();
            }
            Arc { ptr: self.ptr }
        }
    }
    
    impl<T> Drop for Arc<T> {
        fn drop(&mut self) {
            if self.data().ref_count.fetch_sub(1, Release) == 1 {
                fence(Acquire);
                unsafe {
                    drop(Box::from_raw(self.ptr.as_ptr()));
                }
            }
        }
    }
    • fetch_add and fetch_sub are used to increment and decrement the reference count.
    • fetch_add is Relaxed because we don’t need to guarantee the order of the reference count, but pay attention to the next line.
    • fetch_sub is Release and then fence(Acquire) to make sure no one is still accessing the data when we drop it.
      We need to guarantee that all the previous fetch_sub happens before the final fetch_sub.
      In hence, we can use AcqRel for that, however, only the final decrement needs Acquire, so we use Release + fence(Acquire) for efficiency.

Self-Reference

Futher more, if we want to handle ‘self-reference’ struct with Arc, it might cause memory leak, because the Arc will only be dropped when the reference count is zero while self-reference struct will never be zero:

use std::sync::{Arc, Mutex};

#[derive(Debug)]
struct MyStruct {
    value: String,
    self_ref: Option<Arc<Mutex<MyStruct>>>,
}

impl MyStruct {
    fn new(value: String) -> Self {
        MyStruct {
            value,
            self_ref: None,
        }
    }

    fn set_reference(&mut self, other: Arc<Mutex<MyStruct>>) {
        self.self_ref = Some(other);
    }
}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("MyStruct is dropped");
    }
}

struct DetectDrop {}

impl Drop for DetectDrop {
    fn drop(&mut self) {
        println!("DetectDrop is dropped");
    }
}

fn main() {
    let a = Arc::new(Mutex::new(MyStruct::new("Hello".to_string())));
    let _b = Arc::new(DetectDrop {});

    {
        let mut a_ref = a.lock().unwrap();
        a_ref.set_reference(a.clone());
    }

    println!("Arc count: {}", Arc::strong_count(&a));
}

// OUTPUT:
// Arc count: 2
// DetectDrop is dropped

So, here comes the Weak.


Weak

Weak is a non-owning reference to the managed allocation, which means it won’t increase the reference count.

To achieve that, we can simply split the Arc into Weak and Arc:

pub struct Arc<T> {
    ptr: NonNull<ArcData<T>>,
}

unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}

pub struct Weak<T> {
    ptr: NonNull<ArcData<T>>,
}

unsafe impl<T: Sync + Send> Send for Weak<T> {}
unsafe impl<T: Sync + Send> Sync for Weak<T> {}

struct ArcData<T> {
    /// Number of `Arc`s.
    data_ref_count: AtomicUsize,
    /// Number of `Weak`s, plus one if there are any `Arc`s.
    alloc_ref_count: AtomicUsize,
    /// The data. Dropped if there are only weak pointers left.
    data: UnsafeCell<ManuallyDrop<T>>,
}

We’re not going to copy the code from the book, let’s just summarize the key points:

  • Using giant numbers as a limit is feasible, in the usage scenario of Arc, there’re no operating systems that can handle usize::MAX / 2 threads.
  • The inner data is still accessible even if all the Arc instances are dropped, because the Weak is still holding the ArcData. However, nobody can access it outside of the Arc crate - since we only provide upgrade method, and it’ll return None.
  • get_mut returns None unless the Arc is the only one holding the data.

Have a look at this example:

#[cfg(test)]
mod tests {
    use std::thread::{self};

    use super::*;

    #[test]
    fn test_arc() {
        thread::scope(|s| {
            let mut arc = Arc::new(42);
            // only one reference, so we can get_mut
            assert!(Arc::get_mut(&mut arc).is_some());

            let weak1 = Arc::downgrade(&arc);
            // more than one reference, so we cannot get_mut
            assert!(Arc::get_mut(&mut arc).is_none());

            let t = s.spawn(move || {
                thread::park();

                // we cannot upgrade again
                assert!(weak1.upgrade().is_none());

                // but the data is still there, it's just not accessible outside of the crate
                let d = weak1.data();
                println!("arc refs: {:?}", d.data_ref_count.load(Relaxed));
                println!("weak refs: {:?}", d.alloc_ref_count.load(Relaxed));
                println!("data: {:?}", unsafe { &*d.data.get() });
            });

            // wait until the Arc is dropped
            s.spawn(move || {
                drop(arc);
            })
            .join()
            .unwrap();

            t.thread().unpark();
        })
    }
}

Extra - Primitives

We’ve talked about UnsafeCell in the first chapter Cell, RefCell, and we’ve seen more primitives in Chan and Arc. So it’s time to summarize them:

UnsafeCell

As we’ve mentioned before, UnsafeCell is the core primitive for interior mutability, and it’s the only safe way to have mutable data behind a shared reference.

E.g:

fn cannot_compile() {
    let x = UnsafeCell::new(1);
    let y = &x as *const UnsafeCell<i32> as *const i32 as *mut i32;
    unsafe {
        *y = 2;
    }
    println!("{}", unsafe { *x.get() });
}
fn ok() {
    let x = UnsafeCell::new(1);
    unsafe {
        let y = x.get();
        *y = 2;
    }
    println!("{}", unsafe { *x.get() });
}

Note that the complex casting in the left code is copied from UnsafeCell::get(), so the two snippets are actually the same.

However, the code on the left side will not compile, rustc will throw an error like this:

error: assigning to &T is undefined behavior, consider using an UnsafeCell

Nonetheless, you can wrap the conversion in to a method of a struct to ‘make it work’:

struct MyUnsafeCell<T: ?Sized> {
    value: T,
}

impl<T> MyUnsafeCell<T> {
    fn new(v: T) -> Self {
        MyUnsafeCell { value: v }
    }

    fn get(&self) -> *mut T {
        self as *const MyUnsafeCell<T> as *const T as *mut T
    }
}

pub fn ok() {
    let x = MyUnsafeCell::new(1);
    let y = x.get();
    unsafe {
        *y = 2;
    }
    println!("{}", x.value);
}

But this is not the right way, because:

Aside: lang item

There’re some ‘black magic’ in rust – if you’re trying to copy the code of some builtin types, such as UnsafeCell or ManuallyDrop, and expect it to work just like the std lib, you might get disappointed. (Like you cannot compile the code above)

It’s even worse when you’re doing the same way with ManuallyDrop, because the compiler will not prevent you from writing those codes, so the unexpected behavior will happen at runtime and it’s hard to detect.

So why does the std lib work and yours does not? The answer is lang item.

You can find the Attribute above the UnsafeCell: #[lang = "unsafe_cell"]
And this will make the compiler treat UnsafeCell as a special type, and it will be handled differently from the normal types, what’s more, this ‘handle’ is pluggable.

Check this for more detail: lang item

So, what if we put this attribute on our struct?

#[lang = "unsafe_cell"]
struct MyUnsafeCell<T: ?Sized> {
    value: T,
}

Bad news, it won’t work, because this attribute cannot be redefined:

error[E0152]: found duplicate lang item unsafe_cell

If you insist to do so, you’ll have to build a free-standing crate yourself.
Check this out: Error code E0152, #![no_std]


NonNull

To be short, as the doc says: “*mut T but non-zero and covariant”.

So we can use it if we’re sure that the pointer is not null, and we can use it to bypass the safety check of *mut T to get better performance.

non-zero *mut T

fn main() {
    // addr = 0x0
    let nptr: *mut i32 = ptr::null_mut();
    println!("{} {}", nptr.is_null(), nptr.is_aligned()); // true true

    // safe
    let nnull = NonNull::new(nptr);
    println!("{:?}", nnull); // None

    // compile error in debug mode
    // runtime error in release mode if you try to dereference it
    let force_create = unsafe { NonNull::new_unchecked(nptr) };
    let fptr = force_create.as_ptr();
    println!("{:?}", fptr); // 0x0
    println!("{}", unsafe { *fptr }); // runtime error
}

NonNull will prevent you from creating a null NonNull instance, but if you insist to do so by calling new_unchecked, you’ll get a runtime error(if there’re no errors, it’s still a UB) when you try to dereference it.

Aside: assert_unsafe_precondition!

NonNull::new_unchecked uses this macro to check the null pointer.

As we know, xxx_unchecked is a common pattern in rust, and it’s always used to bypass the safety check to improve performance.

So why does NonNull still check the null pointer? The secret is:

“The check is enabled at runtime if debug assertions are enabled when the caller is monomorphized. In const-eval/Miri checks implemented with this macro for language UB are always ignored.”

So, this check is only enabled in debug mode.

BTW, do you know how to enable/disable the debug assertions? There’re 3 ways:

  • put RUSTFLAGS="-C debug-assertions=on" before cargo build
  • add debug-assertions = true in the Cargo.toml
  • by default, cargo build --release will disable the debug assertions

covariant

This is a quite complex and subtle concept, I recommend you to read this article in Rustonomicon: 🔗Subtyping and Variance.

Anyway, let’s have a quick look of Variance in rust:

fn debug<'a>(a: &'a str, b: &'a str) {
    println!("a = {a:?} b = {b:?}");
}

We can see that a and b have the same lifetime 'a, but does the debug function really requires the same lifetime for a and b?

The answer is NO:

fn main() {
    let hello: &'static str = "hello";
    {
        let world = String::from("world");
        let world = &world; // 'world has a shorter lifetime than 'static
        debug(hello, world);
    }
}

Let’s break it down without complicated concepts:

  • assume there’re 2 lifetimes: 'long and 'short
  • 'long’s code scope completely covers 'short’s code scope

Then we can say that 'long can be downgraded to 'short.

In hence, debug(hello, world) is:

debug<'world>('world hello, 'world world)

instead of

debug<'static>('static hello, 'static world).

Now let’s find the concepts back:

  • 'static is a SubType of 'world
  • 'world is a SuperType of 'static

It’s not hard to understand the naming - we can always cast a child class into a parent class in OO languages like Java.

So we can always cast a SubType into a SuperType in Rust.

Look back to Variance:

‘Variance’ is another concept for describing the relationship between lifetimes. And it has 3 sub-concepts:

  • Covariant: if 'a is a subtype of 'b, then Foo<'a> is a subtype of Foo<'b>.
  • Contravariant: if 'a is a subtype of 'b, then Foo<'b> is a subtype of Foo<'a>.
  • Invariant: if 'a is a subtype of 'b, then Foo<'a> and Foo<'b> are not related to the precondition.

Let’s stop here, we can find more details in the book Rustonomicon.


ManuallyDrop

A wrapper to inhibit the compiler from automatically calling T’s destructor and it’s 0-cost.

Take a look at the definition:

#[stable(feature = "manually_drop", since = "1.20.0")]
#[lang = "manually_drop"]
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct ManuallyDrop<T: ?Sized> {
    value: T,
}

This is a lang item and it’s repr(transparent), so it’s just a wrapper of T, and the implementation is a plugin of the compiler.

Using ManuallyDrop is quite simple:

struct DetectDrop<T: Debug> {
    value: T,
}

impl<T: Debug> Drop for DetectDrop<T> {
    fn drop(&mut self) {
        println!("Dropping DetectDrop with value: {:?}", self.value);
    }
}

fn main() {
    // immutable, use into_inner to take the value out and drop it
    let x = ManuallyDrop::new(DetectDrop { value: 52 });
    {
        let _ = ManuallyDrop::into_inner(x);
    }

    // mutable, use `drop` to drop the value
    let mut y = ManuallyDrop::new(DetectDrop { value: 99 });
    unsafe {
        ManuallyDrop::drop(&mut y);
    }
}

MaybeUninit

A wrapper to represent an value which is probably uninitialized.

#[stable(feature = "maybe_uninit", since = "1.36.0")]
#[lang = "maybe_uninit"]
#[derive(Copy)]
#[repr(transparent)]
pub union MaybeUninit<T> {
    uninit: (),
    value: ManuallyDrop<T>,
}

As we can see, it’s also a lang item and repr(transparent), meanwhile it’s a wrapper of ManuallyDrop<T>.

The design pattern is easy to understand:

  • when you create an uninit value, it’s just a () and it’s safe to drop
  • but when you assign a value to it, it’ll become a ManuallyDrop<T> so it’s unsafe to drop

So what’s the purpose of MaybeUninit? Especially when we have ManuallyDrop?

A typical use is lazy initializing:

fn main() {
    let mut arr: [MaybeUninit<i32>; 10000] = unsafe { MaybeUninit::uninit().assume_init() };

    for i in 0..100 {
        arr[i] = MaybeUninit::new(i as i32);
    }

    let mut sum = 0;
    for i in 0..100 {
        sum += unsafe { arr[i].assume_init() };
    }

    println!("{}", sum);
}

The cost of the array itself cannot be reduced, but the cost of element is optimized.

The example above is quite simple, what if we replace i32 with a complex struct? – We have to call T::new() for each element, and those ‘zero’ values don’t take zero cost.

You may noticed that we can replace MaybeUninit with Option in this example, so what’s the difference?

  • Option is a enum and it’s a fat pointer, while MaybeUninit is zero-cost.
  • Option can tell you whether the value is initialized, but MaybeUninit cannot.
    Which indicates that you as a programmer, are responsible for keeping track of it and never reading it if it’s uninitialized

PhantomData

This is a marker for static analysis, and it’s used to tell the compiler that you’re using a type parameter in a struct for a reason other than storing a value of that type.

For exmaple, sometimes we want to correlate the lifetime of a struct with the lifetime of a reference in it:

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
}

Since the lifetime specifier is not used inside the struct, the lifetime is actually boundless, we can fix that issue with PhantomData like this:

use std::marker;

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
    _marker: marker::PhantomData<&'a T>,
}

Another typical usage of PhantomData is make a raw pointer own the data:

use std::marker:

struct Vec<T> {
    data: *const T,
    len: usize,
    cap: usize,
    _marker: marker::PhantomData<T>,
}

Without the _marker, the Vec will not own the data, then the data will not be dropped when the Vec is dropped.


Processors

Quick view of differences between RISC and CISC

Rust:

fn add_10(n: &mut i32) {
    *n += 10;
}

RISC(ARM):

add_10:
    ldr w8, [x0]    // load
    add w8, w8, #10 // modify
    str w8, [x0]    // store
    ret

CISC(x86-64, Intel, AMD):

add_10:
    add dword ptr [rdi], 10 // all in one
    ret

Simple Load / Store

For both of the atomic and non-atomic rust codes, the compiled asm codes are the same:

pub fn store(x: &mut i32) {
    *x = 0;
}

pub fn atomic_store(x: &AtomicI32) {
    x.store(0, Relaxed);
}
x86_store:
    mov dword ptr [rdi], 0
    ret

arm_store:
    str wzr, [x0]
    ret

Because mov and str are already atomic.

x86 Lock Prefix

pub fn add(x: &AtomicI32) -> i32 {
    x.fetch_add(10, Relaxed)
}
add:
    mov eax, 10
    lock xadd dword ptr [rdi], eax
    ret

Note that not all the instructions can be prefixed with lock.

lock originally caused the processor to temporarily block all other cores from accessing memory for the duration of the instruction, but newer processors won’t stop other cores from operating on unrelated memory.

x86 compare and exchange

fetch-and-modify operation can be implemented as a compare-and-exchange loop:

pub fn a(x: &AtomicI32) -> i32 {
    x.fetch_or(10, Relaxed)
}

// a is same as b

pub fn b(x: &AtomicI32) -> i32 {
    let mut current = x.load(Relaxed);
    loop {
        let new = current | 10;
        match x.compare_exchange(current, new, Relaxed, Relaxed) {
            Ok(v) => return v,
            Err(v) => current = v,
        }
    }
}
a:
    mov eax, dword ptr [rdi]
.L1: // loop until cas success
    mov ecx, eax
    or ecx, 10
    // note that [cmpxchg dest, src] compares the dest with eax
    lock cmpxchg dword ptr [rdi], ecx
    jne .L1
    ret

On x86-64, there is no difference between compare_exchange and compare_exchange_weak. Both compile down to a lock cmpxchg instruction.


Rust Atomics and Locks
https://kayce.world/tech/rust_atomics_and_locks/
Author
kayce
Posted on
December 18, 2024
Updated on
February 27, 2025
Licensed under