Rust Atomics and Locks
Last updated on February 27, 2025 pm
WIP, No ETA provided
TL;DR - Please refer to the ToC on the right side
The tremendous book
Rust Atomics and Locks by Mara Bos
in my POV.
Cell, RefCell
Interior mutability: a design pattern in Rust that allows you to mutate data even when there are immutable references to that data.
In short, you can use fn foo(&self) to change the data inside the struct. Normally, we have to use fn foo(&mut self).
Cell
take()out the data, modify it, andset()it back.Mutexis the concurrent version ofCell.
Refcell:
borrow_mut()will panic if there’s already a mutable borrow.RWMutexis the concurrent version ofRefCell, but it will block until the mutable borrow is released instead of panicking.
Aside: UnsafeCell
Both of them are built with a core primitive called UnsafeCell.
This struct provides the basic abstraction for interior mutability, however, you have to use unsafe to access.
Cell, RefCell and all other types that allows internal mutability use UnsafeCell to wrap their data and provide safe apis.
MutexGuard lifetime
if list.lock().unwrap().pop() == Some(1) { // <--- drop here
do_something();
}
// PERF: needlessly hold on to the lock while processing the item.
if let Some(item) = list.lock().unwrap().pop() {
process_item(item);
} // <--- drop here
// FIXED:
let item = list.lock().unwrap().pop(); // <--- drop here
if let Some(item) = item {
process_item(item);
}Reason:
- The basic
ifstatement is always a simplebooleanexpression. - If we replace
pop()withfront(), things’re clear. That’s how borrow checker works.
Parking
Consider this situation: we only process items when the list is not empty.
If we use Mutex, then we have to keep calling lock() again and again, which is not efficient.
use std::collections::VecDeque;
fn main() {
let queue = Mutex::new(VecDeque::new());
thread::scope(|s| {
// Consuming thread
let t = s.spawn(|| loop {
let item = queue.lock().unwrap().pop_front();
if let Some(item) = item {
dbg!(item);
} else {
thread::park();
}
});
// Producing thread
for i in 0.. {
queue.lock().unwrap().push_back(i);
t.thread().unpark();
thread::sleep(Duration::from_secs(1));
}
});
}A call to unpark() before the thread parks itself does not get lost
However, unpark requests don’t stack up.
Calling unpark() two times and then calling park() two times afterwards still results in the thread going to sleep.
Cond Vars
The example above is low-efficiency when we want to use more consumers.
Because the producer doesn’t know which consumer to wake up.
fn main() {
use std::sync::Condvar;
use std::{collections::VecDeque, sync::Mutex, thread, time::Duration};
let queue = Mutex::new(VecDeque::new());
let not_empty = Condvar::new();
let empty = Condvar::new();
thread::scope(|s| {
// not_empty handler
s.spawn(|| loop {
let mut q = queue.lock().unwrap();
let item = loop {
if let Some(item) = q.pop_front() {
break item;
} else {
empty.notify_one(); // wake up the empty handler(s)
q = not_empty.wait(q).unwrap();
}
};
drop(q);
println!("Got item: {}", item);
});
// empty handler
s.spawn(|| loop {
let mut q = queue.lock().unwrap();
loop {
if q.is_empty() {
break;
} else {
q = empty.wait(q).unwrap();
}
}
drop(q);
println!("Queue is empty");
thread::sleep(Duration::from_millis(400));
});
// producer
for i in 0.. {
queue.lock().unwrap().push_back(i);
not_empty.notify_one(); // wake up the not_empty handler(s)
thread::sleep(Duration::from_secs(1));
}
});
}Memory Ordering
Happens-Before
Before we dive into the specific memory orderings below, we must have a deep understanding of happens-before relationship.
happens-before dose not mean that the first operation is literally executed before the second one. (If you want to achieve that, you should use Mutex, CondVar, Chan or other synchronization primitives to help threads communicate.)
There’re 2 points to remember:
-
happens-beforeis all about the ‘instruction reordering’ by the compiler or the CPU. -
happens-beforeitself doesn’t happen untilobservable behavioroccurs.
For example:
Thread A: Thread B:
some ops (A) ----------
... |
a.store(1, Release); | visible
|
| if a.load(Acquire) == 1 {
-> (B) // do something
}There’s no happens-before relationship until Thread B sees the value 1 in a.load(Acquire), and at that point, we can say that (A) happens before (B), or (A) is visible to (B).
Relaxed
While atomic operations using relaxed memory ordering do not provide any happens-before relationship,
they do guarantee a total modification order of each individual atomic variable.
This means that all modifications of the same atomic variable happen in an order that is the same from the perspective of every single thread.
Example:
static X: AtomicI32 = AtomicI32::new(0);
// thread 1
fn a() {
X.fetch_add(5, Relaxed);
X.fetch_add(10, Relaxed);
}
// thread 2
fn b() {
let a = X.load(Relaxed);
let b = X.load(Relaxed);
let c = X.load(Relaxed);
let d = X.load(Relaxed);
println!("{a} {b} {c} {d}");
}The output will never be 0 5 0 15 or 0 0 10 15 due to the order of the 2 fetch_add ops.
0 5 0 15: since we’ve written 5 and it has been read, the next read mustn’t be 0.0 0 10 15: write 5 happens before write 10.
Release / Acquire
Release corresponds to the store operation, and Acquire corresponds to the load operation.
use std::sync::atomic::Ordering::{Acquire, Release};
static DATA: AtomicU64 = AtomicU64::new(0);
static READY: AtomicBool = AtomicBool::new(false);
fn main() {
thread::spawn(|| {
DATA.store(123, Relaxed);
READY.store(true, Release); // Everything from before this store ..
});
while !READY.load(Acquire) { // .. is visible after this loads `true`.
thread::sleep(Duration::from_millis(100));
println!("waiting...");
}
println!("{}", DATA.load(Relaxed));
}Note that if we use Relaxed instead of Acquire for READY, we might read 0 instead of 123.
That’s because Relaxed only provides a total order for the same atomic variable, not across different variables.
Since Release and Acquire have a happens-before relationship, we can simply use u64 instead of AtomicU64:
static mut DATA: u64 = 0;
static READY: AtomicBool = AtomicBool::new(false);
fn main() {
thread::spawn(|| {
// Safety: Nothing else is accessing DATA,
// because we haven't set the READY flag yet.
unsafe { DATA = 123 };
READY.store(true, Release); // Everything from before this store ..
});
while !READY.load(Acquire) {
// .. is visible after this loads `true`.
thread::sleep(Duration::from_millis(100));
println!("waiting...");
}
// Safety: Nothing is mutating DATA, because READY is set.
println!("{}", unsafe { DATA });
}SeqCst
To be short, the order of excution is just like the order of the codes.
Here’s a example of the difference between Acq/Rel and SeqCst:
static A: AtomicUsize = AtomicUsize::new(0);
static B: AtomicUsize = AtomicUsize::new(0);
fn thread1() {
A.store(1, SeqCst);
B.store(2, SeqCst);
}
fn thread2() {
let b = B.load(SeqCst);
let a = A.load(SeqCst);
println!("a = {}, b = {}", a, b);
}If b is
2then a must be1.
static A: AtomicUsize = AtomicUsize::new(0);
static B: AtomicUsize = AtomicUsize::new(0);
fn thread1() {
A.store(1, Release);
B.store(2, Relaxed);
}
fn thread2() {
let b = B.load(Relaxed);
let a = A.load(Acquire);
println!("a = {}, b = {}", a, b);
}If b is
2, we can’t guarantee that a is1.
Fence
To be short, fence stops the compiler from reordering instructions following the fence with instructions preceding the fence.
More specifically, a
fence(Release)will prevent the store instructions being reordered after the fence, and afence(Acquire)will prevent the read instructions being reordered before the fence.
We’ve already known that Release and Acquire have a happens-before relationship, and we can use fence to substitute them like this:
before:
// Thread A
{
DATA.store(42, Relaxed);
READY.store(true, Release); // What happens-before the `Release`...
}
// Thread B
if READY.load(Acquire) { // ... is visible after the `Acquire`.
let value = DATA.load(Relaxed); // The observation has occured (READY false -> true), so we can see the value must be 42.
assert_eq!(value, 42);
}after:
// Thread A
{
DATA.store(42, Relaxed);
fence(Release); // fence before store
READY.store(true, Relaxed); // just Relaxed
}
// Thread B
if READY.load(Relaxed) { // just Relaxed
fence(Acquire); // fence after load
let value = DATA.load(Relaxed);
assert_eq!(value, 42);
}fenceafterload:
Let’s say, if we move the fence to the front of the if block, then the compiler might reorder the load before the if check:
// Thread B
fence(Acquire);
let value = DATA.load(Relaxed); // what value will be?
if READY.load(Relaxed) {
println!("DATA = {}", value);
}So we’ll probably get DATA = 0 instead of DATA = 42.
fencebeforestore
{
DATA.store(42, Relaxed);
fence(Release); // make sure DATA is stored
READY.store(true, Relaxed);
}Finally, check another example to make a deeper impression - use fence to get better performance:
let p = PTR.load(Acquire);
if p.is_null() {
println!("no data");
} else {
println!("data = {}", unsafe { *p });
}
let p = PTR.load(Relaxed);
if p.is_null() {
println!("no data");
} else {
fence(Acquire);
println!("data = {}", unsafe { *p });
}Why is the performance better?
Because thefenceis only needed whenpis not null, and load byRelaxedis lighter weight thanAcquire.How does it work?
Becausefencestops*pfrom being reordered before theifcheck, and we can avoid theAcquireload whenpis null.
Any way, if you find it hard to understand, keeping these 2 rules in mind might help:
a.store(1, Release);
// equals to
fence(Release);
a.store(1, Relaxed);let a = b.load(Acquire);
// equals to
let a = b.load(Relaxed);
fence(Acquire);If you simply do ‘Variable Replacement’, you’ll see that a fence isn’t actually dependent on another fence, it can also work with a store or a load.
i.e. A fence(Acquire) doesn’t have to work with a fence(Release), we don’t have to use fence in pairs.
And that’s how the last example works.
PS: The ‘replacement’ doesn’t strictly requires you to do something like :s/ori/new. Of course you can add some lines between fence and store / load, as long as the codes you add don’t care about the mem order.
Misconceptions
- I need strong memory ordering to make sure changes are immediately visible to other threads.
- In real life, memory ordering is about things like reordering instructions, which usually happen at nanosecond scale.
Stronger memory ordering does not make your data travel faster; it might even slow your program down.
- Disabling optimization means I don’t need to care about memory ordering.
- Processor optimizations still happen even if you disable compiler optimizations.
- Sequentially consistent memory ordering can be used for a “release-load” or an “acquire-store.”
- Release-store does not form any release-acquire relationship with a SeqCst-store.
If you need them to be part of a globally consistent order, both operations will have to use SeqCst.
Summary
| type | guarantee |
|---|---|
| Relaxed | Total modification order of a specific single atomic variable |
| Acq/Rel | Make sure changes before “Rel” are visible to “Acq” |
| SeqCst | Just like the order of the codes look like |
| Fence | Codes after the fence are not reordered with the codes before the fence |
Aside: compare_exchange vs. compare_exchange_weak
To be short, compare_exchange_weak might return Err even if the comparison is successful.
This is because the low-level instructions of _weak is LL/SC (Load-Linked/Store-Conditional), which might fail due to cache contention.
Since we always need to use while loop to retry, compare_exchange_weak is more efficient on some specific platforms, such as ARM.
What’s more, since the x86/x64 arch provides a strong memory named TSO (Total Store Order), thus store(Release) or load(Acquire) might simply be compiled into a MOV instruction;
However, on ARM, the memory model is less strict, so the compiler might generate a DMB (Data Memory Barrier) instruction for Release or Acquire.
Spin Lock
If a lock is only ever held for very brief moments and the threads locking it can run in parallel on different processor cores,
it might be better for the threads to repeatedly try to lock it without actually going to sleep.
Minimal Impl
impl SpinLock {
pub const fn new() -> Self {
Self { locked: AtomicBool::new(false) }
}
pub fn lock(&self) {
while self.locked.swap(true, Acquire) {
std::hint::spin_loop();
}
}
pub fn unlock(&self) {
self.locked.store(false, Release);
}
}Hold value
pub struct SpinLock<T> {
locked: AtomicBool,
value: UnsafeCell<T>,
}
impl<T> SpinLock<T> {
pub fn new(value: T) -> Self {
Self {
locked: AtomicBool::new(false),
value: UnsafeCell::new(value),
}
}
// use this macro to suppress the clippy error
#[allow(clippy::mut_from_ref)]
pub fn lock(&self) -> &mut T {
while self.locked.swap(true, Acquire) {
std::hint::spin_loop();
}
unsafe { &mut *self.value.get() }
}
pub fn unlock(&self) {
self.locked.store(false, Release);
}
}Use Guard to auto unlock
pub struct Guard<'a, T> {
lock: &'a SpinLock<T>,
}
impl<T> Deref for Guard<'_, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
unsafe { &*self.lock.value }
}
}
impl<T> DerefMut for Guard<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { &mut *self.lock.value }
}
}
impl<T> Drop for Guard<'_, T> {
fn drop(&mut self) {
self.lock.unlock();
}
}Now let’s change the lock method to return a Guard:
impl<T> SpinLock<T> {
pub fn lock(&self) -> Guard<T> {
while self.locked.swap(true, Acquire) {
std::hint::spin_loop();
}
Guard { lock: self }
}
}Fence it!
Remember we’ve talked about using fence to substitute load or store in the previous section Fence? There happens to be an example of building a SpinLock with fence in the rust doc of fence itself:
pub struct Mutex {
flag: AtomicBool,
}
impl Mutex {
pub fn new() -> Mutex {
Mutex {
flag: AtomicBool::new(false),
}
}
pub fn lock(&self) {
// Wait until the old value is `false`.
while self
.flag
.compare_exchange_weak(false, true, Ordering::Relaxed, Ordering::Relaxed)
.is_err()
{}
// This fence synchronizes-with store in `unlock`.
fence(Ordering::Acquire);
}
pub fn unlock(&self) {
self.flag.store(false, Ordering::Release);
}
}A little bit confused of the single fence? Let’s break it down:
First, to see things clearly, we can cast the while loop to a loop:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self
.flag
.compare_exchange_weak(false, true, Ordering::Relaxed, Ordering::Relaxed)
.is_ok()
{
break;
}
}
// This fence synchronizes-with store in `unlock`.
fence(Ordering::Acquire);
}Then, we replace compare_exchange_weak with load and store, note that this is impossible in real code because split cas into individual load and store will break the atomicity:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self.flag.load(Ordering::Relaxed) == false {
self.flag.store(true, Ordering::Relaxed);
break;
}
}
// This fence synchronizes-with store in `unlock`.
fence(Ordering::Acquire);
}Do a little bit adjustment:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self.flag.load(Ordering::Relaxed) == false {
fence(Ordering::Acquire);
self.flag.store(true, Ordering::Relaxed);
break;
}
}
}Now we can see that fence after load pattern! It’s equivalent to:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self.flag.load(Ordering::Acquire) == false {
self.flag.store(true, Ordering::Relaxed);
break;
}
}
}And that’s exactly how does the fence synchronizes-with the store in unlock.
It’s a little bit verbose, but it’s a good way to understand how fence works.
Chan
pub struct Channel<T> {
message: UnsafeCell<MaybeUninit<T>>,
ready: AtomicBool,
}
unsafe impl<T> Sync for Channel<T> where T: Send {}
pub struct Sender<'a, T> {
channel: &'a Channel<T>,
recv_thread: Thread,
}
pub struct Receiver<'a, T> {
channel: &'a Channel<T>,
_marker: std::marker::PhantomData<&'a T>,
}
impl<T> Channel<T> {
pub const fn new() -> Self {
Self {
message: UnsafeCell::new(MaybeUninit::uninit()),
ready: AtomicBool::new(false),
}
}
pub fn split(&mut self) -> (Sender<T>, Receiver<T>) {
*self = Self::new();
(
Sender {
channel: self,
recv_thread: thread::current(),
},
Receiver {
channel: self,
_marker: std::marker::PhantomData,
},
)
}
}
impl<T> Sender<'_, T> {
pub fn send(self, message: T) {
unsafe { (*self.channel.message.get()).write(message) };
self.channel.ready.store(true, Release);
self.recv_thread.unpark();
}
}
impl<T> Receiver<'_, T> {
pub fn is_ready(&self) -> bool {
self.channel.ready.load(Relaxed)
}
pub fn receive(self) -> T {
while !self.channel.ready.swap(false, Acquire) {
thread::park();
}
unsafe { (*self.channel.message.get()).assume_init_read() }
}
}
impl<T> Drop for Channel<T> {
fn drop(&mut self) {
if *self.ready.get_mut() {
unsafe { self.message.get_mut().assume_init_drop() }
}
}
}Pay attention to these points:
- Use
MaybeUninitto save memory. - Abstract
SenderandReceiverto limit the user’s access to the channel.(Think about theMutexGuardinMutex) - Use thread handle to park / unpark the receiver thread.
- Do not forget to drop the
MaybeUninitdue to the struct won’t drop the inner data automatically.
Arc
Minimal Impl
-
snippet 1:
struct ArcData<T> { ref_cnt: AtomicUsize, data: T, } pub struct Arc<T> { ptr: NonNull<ArcData<T>>, } unsafe impl<T: Send + Sync> Send for Arc<T> {} unsafe impl<T: Send + Sync> Sync for Arc<T> {}NonNullis almost same as*mut Tbut non-null and [covariant]Arc<T>should be shared between threads, soTshould beSync.Arc<T>could be dropped by other threads, soTshould beSend.
-
snippet 2:
impl<T> Arc<T> { pub fn new(data: T) -> Arc<T> { Arc { ptr: NonNull::from(Box::leak(Box::new(ArcData { ref_count: AtomicUsize::new(1), data, }))), } } fn data(&self) -> &ArcData<T> { unsafe { self.ptr.as_ref() } } } impl<T> Deref for Arc<T> { type Target = T; fn deref(&self) -> &T { &self.data().data } }- Use
Box::leakto give up the exclusive ownership of aBoxallocation.
- Use
-
snippet 3:
impl<T> Clone for Arc<T> { fn clone(&self) -> Self { if self.data().ref_count.fetch_add(1, Relaxed) > usize::MAX / 2 { std::process::abort(); } Arc { ptr: self.ptr } } } impl<T> Drop for Arc<T> { fn drop(&mut self) { if self.data().ref_count.fetch_sub(1, Release) == 1 { fence(Acquire); unsafe { drop(Box::from_raw(self.ptr.as_ptr())); } } } }fetch_addandfetch_subare used to increment and decrement the reference count.fetch_addisRelaxedbecause we don’t need to guarantee the order of the reference count, but pay attention to the next line.fetch_subisReleaseand thenfence(Acquire)to make sure no one is still accessing the data when we drop it.
We need to guarantee that all the previousfetch_subhappens before the finalfetch_sub.
In hence, we can useAcqRelfor that, however, only the final decrement needsAcquire, so we useRelease+fence(Acquire)for efficiency.
Self-Reference
Futher more, if we want to handle ‘self-reference’ struct with Arc, it might cause memory leak, because the Arc will only be dropped when the reference count is zero while self-reference struct will never be zero:
use std::sync::{Arc, Mutex};
#[derive(Debug)]
struct MyStruct {
value: String,
self_ref: Option<Arc<Mutex<MyStruct>>>,
}
impl MyStruct {
fn new(value: String) -> Self {
MyStruct {
value,
self_ref: None,
}
}
fn set_reference(&mut self, other: Arc<Mutex<MyStruct>>) {
self.self_ref = Some(other);
}
}
impl Drop for MyStruct {
fn drop(&mut self) {
println!("MyStruct is dropped");
}
}
struct DetectDrop {}
impl Drop for DetectDrop {
fn drop(&mut self) {
println!("DetectDrop is dropped");
}
}
fn main() {
let a = Arc::new(Mutex::new(MyStruct::new("Hello".to_string())));
let _b = Arc::new(DetectDrop {});
{
let mut a_ref = a.lock().unwrap();
a_ref.set_reference(a.clone());
}
println!("Arc count: {}", Arc::strong_count(&a));
}
// OUTPUT:
// Arc count: 2
// DetectDrop is droppedSo, here comes the Weak.
Weak
Weak is a non-owning reference to the managed allocation, which means it won’t increase the reference count.
To achieve that, we can simply split the Arc into Weak and Arc:
pub struct Arc<T> {
ptr: NonNull<ArcData<T>>,
}
unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}
pub struct Weak<T> {
ptr: NonNull<ArcData<T>>,
}
unsafe impl<T: Sync + Send> Send for Weak<T> {}
unsafe impl<T: Sync + Send> Sync for Weak<T> {}
struct ArcData<T> {
/// Number of `Arc`s.
data_ref_count: AtomicUsize,
/// Number of `Weak`s, plus one if there are any `Arc`s.
alloc_ref_count: AtomicUsize,
/// The data. Dropped if there are only weak pointers left.
data: UnsafeCell<ManuallyDrop<T>>,
}We’re not going to copy the code from the book, let’s just summarize the key points:
- Using giant numbers as a limit is feasible, in the usage scenario of
Arc, there’re no operating systems that can handleusize::MAX / 2threads. - The inner data is still accessible even if all the
Arcinstances are dropped, because theWeakis still holding theArcData. However, nobody can access it outside of theArccrate - since we only provideupgrademethod, and it’ll returnNone. get_mutreturnsNoneunless theArcis the only one holding the data.
Have a look at this example:
#[cfg(test)]
mod tests {
use std::thread::{self};
use super::*;
#[test]
fn test_arc() {
thread::scope(|s| {
let mut arc = Arc::new(42);
// only one reference, so we can get_mut
assert!(Arc::get_mut(&mut arc).is_some());
let weak1 = Arc::downgrade(&arc);
// more than one reference, so we cannot get_mut
assert!(Arc::get_mut(&mut arc).is_none());
let t = s.spawn(move || {
thread::park();
// we cannot upgrade again
assert!(weak1.upgrade().is_none());
// but the data is still there, it's just not accessible outside of the crate
let d = weak1.data();
println!("arc refs: {:?}", d.data_ref_count.load(Relaxed));
println!("weak refs: {:?}", d.alloc_ref_count.load(Relaxed));
println!("data: {:?}", unsafe { &*d.data.get() });
});
// wait until the Arc is dropped
s.spawn(move || {
drop(arc);
})
.join()
.unwrap();
t.thread().unpark();
})
}
}Extra - Primitives
We’ve talked about UnsafeCell in the first chapter Cell, RefCell, and we’ve seen more primitives in Chan and Arc. So it’s time to summarize them:
UnsafeCell
As we’ve mentioned before, UnsafeCell is the core primitive for interior mutability, and it’s the only safe way to have mutable data behind a shared reference.
E.g:
fn cannot_compile() {
let x = UnsafeCell::new(1);
let y = &x as *const UnsafeCell<i32> as *const i32 as *mut i32;
unsafe {
*y = 2;
}
println!("{}", unsafe { *x.get() });
}fn ok() {
let x = UnsafeCell::new(1);
unsafe {
let y = x.get();
*y = 2;
}
println!("{}", unsafe { *x.get() });
}Note that the complex casting in the left code is copied from UnsafeCell::get(), so the two snippets are actually the same.
However, the code on the left side will not compile, rustc will throw an error like this:
error: assigning to &T is undefined behavior, consider using an UnsafeCell
Nonetheless, you can wrap the conversion in to a method of a struct to ‘make it work’:
struct MyUnsafeCell<T: ?Sized> {
value: T,
}
impl<T> MyUnsafeCell<T> {
fn new(v: T) -> Self {
MyUnsafeCell { value: v }
}
fn get(&self) -> *mut T {
self as *const MyUnsafeCell<T> as *const T as *mut T
}
}
pub fn ok() {
let x = MyUnsafeCell::new(1);
let y = x.get();
unsafe {
*y = 2;
}
println!("{}", x.value);
}But this is not the right way, because:
Aside: lang item
There’re some ‘black magic’ in rust – if you’re trying to copy the code of some builtin types, such as UnsafeCell or ManuallyDrop, and expect it to work just like the std lib, you might get disappointed. (Like you cannot compile the code above)
It’s even worse when you’re doing the same way with ManuallyDrop, because the compiler will not prevent you from writing those codes, so the unexpected behavior will happen at runtime and it’s hard to detect.
So why does the std lib work and yours does not? The answer is lang item.
You can find the Attribute above the UnsafeCell: #[lang = "unsafe_cell"]
And this will make the compiler treat UnsafeCell as a special type, and it will be handled differently from the normal types, what’s more, this ‘handle’ is pluggable.
Check this for more detail: lang item
So, what if we put this attribute on our struct?
#[lang = "unsafe_cell"]
struct MyUnsafeCell<T: ?Sized> {
value: T,
}Bad news, it won’t work, because this attribute cannot be redefined:
error[E0152]: found duplicate lang item unsafe_cell
If you insist to do so, you’ll have to build a free-standing crate yourself.
Check this out: Error code E0152, #![no_std]
NonNull
To be short, as the doc says: “*mut T but non-zero and covariant”.
So we can use it if we’re sure that the pointer is not null, and we can use it to bypass the safety check of *mut T to get better performance.
non-zero *mut T
fn main() {
// addr = 0x0
let nptr: *mut i32 = ptr::null_mut();
println!("{} {}", nptr.is_null(), nptr.is_aligned()); // true true
// safe
let nnull = NonNull::new(nptr);
println!("{:?}", nnull); // None
// compile error in debug mode
// runtime error in release mode if you try to dereference it
let force_create = unsafe { NonNull::new_unchecked(nptr) };
let fptr = force_create.as_ptr();
println!("{:?}", fptr); // 0x0
println!("{}", unsafe { *fptr }); // runtime error
}NonNull will prevent you from creating a null NonNull instance, but if you insist to do so by calling new_unchecked, you’ll get a runtime error(if there’re no errors, it’s still a UB) when you try to dereference it.
Aside: assert_unsafe_precondition!
NonNull::new_unchecked uses this macro to check the null pointer.
As we know, xxx_unchecked is a common pattern in rust, and it’s always used to bypass the safety check to improve performance.
So why does NonNull still check the null pointer? The secret is:
“The check is enabled at runtime if debug assertions are enabled when the caller is monomorphized. In const-eval/Miri checks implemented with this macro for language UB are always ignored.”
So, this check is only enabled in debug mode.
BTW, do you know how to enable/disable the debug assertions? There’re 3 ways:
- put
RUSTFLAGS="-C debug-assertions=on"beforecargo build - add
debug-assertions = truein theCargo.toml - by default,
cargo build --releasewill disable the debug assertions
covariant
This is a quite complex and subtle concept, I recommend you to read this article in Rustonomicon: 🔗Subtyping and Variance.
Anyway, let’s have a quick look of Variance in rust:
fn debug<'a>(a: &'a str, b: &'a str) {
println!("a = {a:?} b = {b:?}");
}We can see that a and b have the same lifetime 'a, but does the debug function really requires the same lifetime for a and b?
The answer is NO:
fn main() {
let hello: &'static str = "hello";
{
let world = String::from("world");
let world = &world; // 'world has a shorter lifetime than 'static
debug(hello, world);
}
}Let’s break it down without complicated concepts:
- assume there’re 2 lifetimes:
'longand'short 'long’s code scope completely covers'short’s code scope
Then we can say that 'long can be downgraded to 'short.
In hence, debug(hello, world) is:
debug<'world>('world hello, 'world world)
instead of
debug<'static>('static hello, 'static world).
Now let’s find the concepts back:
'staticis a SubType of'world'worldis a SuperType of'static
It’s not hard to understand the naming - we can always cast a child class into a parent class in OO languages like Java.
So we can always cast a SubType into a SuperType in Rust.
Look back to Variance:
‘Variance’ is another concept for describing the relationship between lifetimes. And it has 3 sub-concepts:
- Covariant: if
'ais a subtype of'b, thenFoo<'a>is a subtype ofFoo<'b>. - Contravariant: if
'ais a subtype of'b, thenFoo<'b>is a subtype ofFoo<'a>. - Invariant: if
'ais a subtype of'b, thenFoo<'a>andFoo<'b>are not related to the precondition.
Let’s stop here, we can find more details in the book Rustonomicon.
ManuallyDrop
A wrapper to inhibit the compiler from automatically calling T’s destructor and it’s 0-cost.
Take a look at the definition:
#[stable(feature = "manually_drop", since = "1.20.0")]
#[lang = "manually_drop"]
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct ManuallyDrop<T: ?Sized> {
value: T,
}This is a lang item and it’s repr(transparent), so it’s just a wrapper of T, and the implementation is a plugin of the compiler.
Using ManuallyDrop is quite simple:
struct DetectDrop<T: Debug> {
value: T,
}
impl<T: Debug> Drop for DetectDrop<T> {
fn drop(&mut self) {
println!("Dropping DetectDrop with value: {:?}", self.value);
}
}
fn main() {
// immutable, use into_inner to take the value out and drop it
let x = ManuallyDrop::new(DetectDrop { value: 52 });
{
let _ = ManuallyDrop::into_inner(x);
}
// mutable, use `drop` to drop the value
let mut y = ManuallyDrop::new(DetectDrop { value: 99 });
unsafe {
ManuallyDrop::drop(&mut y);
}
}MaybeUninit
A wrapper to represent an value which is probably uninitialized.
#[stable(feature = "maybe_uninit", since = "1.36.0")]
#[lang = "maybe_uninit"]
#[derive(Copy)]
#[repr(transparent)]
pub union MaybeUninit<T> {
uninit: (),
value: ManuallyDrop<T>,
}As we can see, it’s also a lang item and repr(transparent), meanwhile it’s a wrapper of ManuallyDrop<T>.
The design pattern is easy to understand:
- when you create an
uninitvalue, it’s just a()and it’s safe to drop - but when you assign a value to it, it’ll become a
ManuallyDrop<T>so it’s unsafe to drop
So what’s the purpose of MaybeUninit? Especially when we have ManuallyDrop?
A typical use is lazy initializing:
fn main() {
let mut arr: [MaybeUninit<i32>; 10000] = unsafe { MaybeUninit::uninit().assume_init() };
for i in 0..100 {
arr[i] = MaybeUninit::new(i as i32);
}
let mut sum = 0;
for i in 0..100 {
sum += unsafe { arr[i].assume_init() };
}
println!("{}", sum);
}The cost of the array itself cannot be reduced, but the cost of element is optimized.
The example above is quite simple, what if we replace i32 with a complex struct? – We have to call T::new() for each element, and those ‘zero’ values don’t take zero cost.
You may noticed that we can replace MaybeUninit with Option in this example, so what’s the difference?
Optionis aenumand it’s a fat pointer, whileMaybeUninitis zero-cost.Optioncan tell you whether the value is initialized, butMaybeUninitcannot.
Which indicates that you as a programmer, are responsible for keeping track of it and never reading it if it’s uninitialized
PhantomData
This is a marker for static analysis, and it’s used to tell the compiler that you’re using a type parameter in a struct for a reason other than storing a value of that type.
For exmaple, sometimes we want to correlate the lifetime of a struct with the lifetime of a reference in it:
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
}Since the lifetime specifier is not used inside the struct, the lifetime is actually boundless, we can fix that issue with PhantomData like this:
use std::marker;
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: marker::PhantomData<&'a T>,
}Another typical usage of PhantomData is make a raw pointer own the data:
use std::marker:
struct Vec<T> {
data: *const T,
len: usize,
cap: usize,
_marker: marker::PhantomData<T>,
}Without the _marker, the Vec will not own the data, then the data will not be dropped when the Vec is dropped.
Processors
Quick view of differences between RISC and CISC
Rust:
fn add_10(n: &mut i32) {
*n += 10;
}RISC(ARM):
add_10:
ldr w8, [x0] // load
add w8, w8, #10 // modify
str w8, [x0] // store
retCISC(x86-64, Intel, AMD):
add_10:
add dword ptr [rdi], 10 // all in one
retSimple Load / Store
For both of the atomic and non-atomic rust codes, the compiled asm codes are the same:
pub fn store(x: &mut i32) {
*x = 0;
}
pub fn atomic_store(x: &AtomicI32) {
x.store(0, Relaxed);
}x86_store:
mov dword ptr [rdi], 0
ret
arm_store:
str wzr, [x0]
retBecause mov and str are already atomic.
x86 Lock Prefix
pub fn add(x: &AtomicI32) -> i32 {
x.fetch_add(10, Relaxed)
}add:
mov eax, 10
lock xadd dword ptr [rdi], eax
retNote that not all the instructions can be prefixed with lock.
lock originally caused the processor to temporarily block all other cores from accessing memory for the duration of the instruction, but newer processors won’t stop other cores from operating on unrelated memory.
x86 compare and exchange
fetch-and-modify operation can be implemented as a compare-and-exchange loop:
pub fn a(x: &AtomicI32) -> i32 {
x.fetch_or(10, Relaxed)
}
// a is same as b
pub fn b(x: &AtomicI32) -> i32 {
let mut current = x.load(Relaxed);
loop {
let new = current | 10;
match x.compare_exchange(current, new, Relaxed, Relaxed) {
Ok(v) => return v,
Err(v) => current = v,
}
}
}a:
mov eax, dword ptr [rdi]
.L1: // loop until cas success
mov ecx, eax
or ecx, 10
// note that [cmpxchg dest, src] compares the dest with eax
lock cmpxchg dword ptr [rdi], ecx
jne .L1
retOn x86-64, there is no difference between compare_exchange and compare_exchange_weak. Both compile down to a lock cmpxchg instruction.