Rust Atomics and Locks
Last updated on February 27, 2025 pm
WIP, No ETA provided
TL;DR - Please refer to the ToC on the right side
The tremendous book
Rust Atomics and Locks by Mara Bos
in my POV.
Cell, RefCell
Interior mutability: a design pattern in Rust that allows you to mutate data even when there are immutable references to that data.
In short, you can use fn foo(&self)
to change the data inside the struct. Normally, we have to use fn foo(&mut self)
.
Cell
take()
out the data, modify it, andset()
it back.Mutex
is the concurrent version ofCell
.
Refcell:
borrow_mut()
will panic if there’s already a mutable borrow.RWMutex
is the concurrent version ofRefCell
, but it will block until the mutable borrow is released instead of panicking.
Aside: UnsafeCell
Both of them are built with a core primitive called UnsafeCell
.
This struct provides the basic abstraction for interior mutability, however, you have to use unsafe
to access.
Cell
, RefCell
and all other types that allows internal mutability use UnsafeCell
to wrap their data and provide safe
apis.
MutexGuard lifetime
if list.lock().unwrap().pop() == Some(1) { // <--- drop here
do_something();
}
// PERF: needlessly hold on to the lock while processing the item.
if let Some(item) = list.lock().unwrap().pop() {
process_item(item);
} // <--- drop here
// FIXED:
let item = list.lock().unwrap().pop(); // <--- drop here
if let Some(item) = item {
process_item(item);
}
Reason:
- The basic
if
statement is always a simpleboolean
expression. - If we replace
pop()
withfront()
, things’re clear. That’s how borrow checker works.
Parking
Consider this situation: we only process items when the list is not empty.
If we use Mutex
, then we have to keep calling lock()
again and again, which is not efficient.
use std::collections::VecDeque;
fn main() {
let queue = Mutex::new(VecDeque::new());
thread::scope(|s| {
// Consuming thread
let t = s.spawn(|| loop {
let item = queue.lock().unwrap().pop_front();
if let Some(item) = item {
dbg!(item);
} else {
thread::park();
}
});
// Producing thread
for i in 0.. {
queue.lock().unwrap().push_back(i);
t.thread().unpark();
thread::sleep(Duration::from_secs(1));
}
});
}
A call to unpark()
before the thread parks itself does not get lost
However, unpark requests don’t stack up.
Calling unpark() two times and then calling park() two times afterwards still results in the thread going to sleep.
Cond Vars
The example above is low-efficiency when we want to use more consumers.
Because the producer doesn’t know which consumer to wake up.
fn main() {
use std::sync::Condvar;
use std::{collections::VecDeque, sync::Mutex, thread, time::Duration};
let queue = Mutex::new(VecDeque::new());
let not_empty = Condvar::new();
let empty = Condvar::new();
thread::scope(|s| {
// not_empty handler
s.spawn(|| loop {
let mut q = queue.lock().unwrap();
let item = loop {
if let Some(item) = q.pop_front() {
break item;
} else {
empty.notify_one(); // wake up the empty handler(s)
q = not_empty.wait(q).unwrap();
}
};
drop(q);
println!("Got item: {}", item);
});
// empty handler
s.spawn(|| loop {
let mut q = queue.lock().unwrap();
loop {
if q.is_empty() {
break;
} else {
q = empty.wait(q).unwrap();
}
}
drop(q);
println!("Queue is empty");
thread::sleep(Duration::from_millis(400));
});
// producer
for i in 0.. {
queue.lock().unwrap().push_back(i);
not_empty.notify_one(); // wake up the not_empty handler(s)
thread::sleep(Duration::from_secs(1));
}
});
}
Memory Ordering
Happens-Before
Before we dive into the specific memory orderings below, we must have a deep understanding of happens-before
relationship.
happens-before
dose not mean that the first operation is literally executed before the second one. (If you want to achieve that, you should use Mutex
, CondVar
, Chan
or other synchronization primitives to help threads communicate.)
There’re 2 points to remember:
-
happens-before
is all about the ‘instruction reordering’ by the compiler or the CPU. -
happens-before
itself doesn’t happen untilobservable behavior
occurs.
For example:
Thread A: Thread B:
some ops (A) ----------
... |
a.store(1, Release); | visible
|
| if a.load(Acquire) == 1 {
-> (B) // do something
}
There’s no happens-before
relationship until Thread B sees the value 1
in a.load(Acquire)
, and at that point, we can say that (A)
happens before (B)
, or (A)
is visible to (B)
.
Relaxed
While atomic operations using relaxed memory ordering do not provide any happens-before relationship,
they do guarantee a total modification order of each individual atomic variable.
This means that all modifications of the same atomic variable happen in an order that is the same from the perspective of every single thread.
Example:
static X: AtomicI32 = AtomicI32::new(0);
// thread 1
fn a() {
X.fetch_add(5, Relaxed);
X.fetch_add(10, Relaxed);
}
// thread 2
fn b() {
let a = X.load(Relaxed);
let b = X.load(Relaxed);
let c = X.load(Relaxed);
let d = X.load(Relaxed);
println!("{a} {b} {c} {d}");
}
The output will never be 0 5 0 15
or 0 0 10 15
due to the order of the 2 fetch_add
ops.
0 5 0 15
: since we’ve written 5 and it has been read, the next read mustn’t be 0.0 0 10 15
: write 5 happens before write 10.
Release / Acquire
Release
corresponds to the store
operation, and Acquire
corresponds to the load
operation.
use std::sync::atomic::Ordering::{Acquire, Release};
static DATA: AtomicU64 = AtomicU64::new(0);
static READY: AtomicBool = AtomicBool::new(false);
fn main() {
thread::spawn(|| {
DATA.store(123, Relaxed);
READY.store(true, Release); // Everything from before this store ..
});
while !READY.load(Acquire) { // .. is visible after this loads `true`.
thread::sleep(Duration::from_millis(100));
println!("waiting...");
}
println!("{}", DATA.load(Relaxed));
}
Note that if we use Relaxed
instead of Acquire
for READY
, we might read 0
instead of 123
.
That’s because Relaxed
only provides a total order for the same atomic variable, not across different variables.
Since Release
and Acquire
have a happens-before relationship, we can simply use u64
instead of AtomicU64
:
static mut DATA: u64 = 0;
static READY: AtomicBool = AtomicBool::new(false);
fn main() {
thread::spawn(|| {
// Safety: Nothing else is accessing DATA,
// because we haven't set the READY flag yet.
unsafe { DATA = 123 };
READY.store(true, Release); // Everything from before this store ..
});
while !READY.load(Acquire) {
// .. is visible after this loads `true`.
thread::sleep(Duration::from_millis(100));
println!("waiting...");
}
// Safety: Nothing is mutating DATA, because READY is set.
println!("{}", unsafe { DATA });
}
SeqCst
To be short, the order of excution is just like the order of the codes.
Here’s a example of the difference between Acq/Rel
and SeqCst
:
static A: AtomicUsize = AtomicUsize::new(0);
static B: AtomicUsize = AtomicUsize::new(0);
fn thread1() {
A.store(1, SeqCst);
B.store(2, SeqCst);
}
fn thread2() {
let b = B.load(SeqCst);
let a = A.load(SeqCst);
println!("a = {}, b = {}", a, b);
}
If b is
2
then a must be1
.
static A: AtomicUsize = AtomicUsize::new(0);
static B: AtomicUsize = AtomicUsize::new(0);
fn thread1() {
A.store(1, Release);
B.store(2, Relaxed);
}
fn thread2() {
let b = B.load(Relaxed);
let a = A.load(Acquire);
println!("a = {}, b = {}", a, b);
}
If b is
2
, we can’t guarantee that a is1
.
Fence
To be short, fence
stops the compiler from reordering instructions following the fence with instructions preceding the fence.
More specifically, a
fence(Release)
will prevent the store instructions being reordered after the fence, and afence(Acquire)
will prevent the read instructions being reordered before the fence.
We’ve already known that Release
and Acquire
have a happens-before relationship, and we can use fence
to substitute them like this:
before:
// Thread A
{
DATA.store(42, Relaxed);
READY.store(true, Release); // What happens-before the `Release`...
}
// Thread B
if READY.load(Acquire) { // ... is visible after the `Acquire`.
let value = DATA.load(Relaxed); // The observation has occured (READY false -> true), so we can see the value must be 42.
assert_eq!(value, 42);
}
after:
// Thread A
{
DATA.store(42, Relaxed);
fence(Release); // fence before store
READY.store(true, Relaxed); // just Relaxed
}
// Thread B
if READY.load(Relaxed) { // just Relaxed
fence(Acquire); // fence after load
let value = DATA.load(Relaxed);
assert_eq!(value, 42);
}
fence
afterload
:
Let’s say, if we move the fence
to the front of the if
block, then the compiler might reorder the load
before the if
check:
// Thread B
fence(Acquire);
let value = DATA.load(Relaxed); // what value will be?
if READY.load(Relaxed) {
println!("DATA = {}", value);
}
So we’ll probably get DATA = 0
instead of DATA = 42
.
fence
beforestore
{
DATA.store(42, Relaxed);
fence(Release); // make sure DATA is stored
READY.store(true, Relaxed);
}
Finally, check another example to make a deeper impression - use fence
to get better performance:
let p = PTR.load(Acquire);
if p.is_null() {
println!("no data");
} else {
println!("data = {}", unsafe { *p });
}
let p = PTR.load(Relaxed);
if p.is_null() {
println!("no data");
} else {
fence(Acquire);
println!("data = {}", unsafe { *p });
}
Why is the performance better?
Because thefence
is only needed whenp
is not null, and load byRelaxed
is lighter weight thanAcquire
.How does it work?
Becausefence
stops*p
from being reordered before theif
check, and we can avoid theAcquire
load whenp
is null.
Any way, if you find it hard to understand, keeping these 2 rules in mind might help:
a.store(1, Release);
// equals to
fence(Release);
a.store(1, Relaxed);
let a = b.load(Acquire);
// equals to
let a = b.load(Relaxed);
fence(Acquire);
If you simply do ‘Variable Replacement’, you’ll see that a fence
isn’t actually dependent on another fence
, it can also work with a store
or a load
.
i.e. A fence(Acquire)
doesn’t have to work with a fence(Release)
, we don’t have to use fence
in pairs.
And that’s how the last example works.
PS: The ‘replacement’ doesn’t strictly requires you to do something like :s/ori/new
. Of course you can add some lines between fence
and store
/ load
, as long as the codes you add don’t care about the mem order.
Misconceptions
- I need strong memory ordering to make sure changes are immediately visible to other threads.
- In real life, memory ordering is about things like reordering instructions, which usually happen at nanosecond scale.
Stronger memory ordering does not make your data travel faster; it might even slow your program down.
- Disabling optimization means I don’t need to care about memory ordering.
- Processor optimizations still happen even if you disable compiler optimizations.
- Sequentially consistent memory ordering can be used for a “release-load” or an “acquire-store.”
- Release-store does not form any release-acquire relationship with a SeqCst-store.
If you need them to be part of a globally consistent order, both operations will have to use SeqCst.
Summary
type | guarantee |
---|---|
Relaxed | Total modification order of a specific single atomic variable |
Acq/Rel | Make sure changes before “Rel” are visible to “Acq” |
SeqCst | Just like the order of the codes look like |
Fence | Codes after the fence are not reordered with the codes before the fence |
Aside: compare_exchange
vs. compare_exchange_weak
To be short, compare_exchange_weak
might return Err
even if the comparison is successful.
This is because the low-level instructions of _weak
is LL/SC
(Load-Linked/Store-Conditional), which might fail due to cache contention.
Since we always need to use while
loop to retry, compare_exchange_weak
is more efficient on some specific platforms, such as ARM
.
What’s more, since the x86/x64
arch provides a strong memory named TSO
(Total Store Order), thus store(Release)
or load(Acquire)
might simply be compiled into a MOV
instruction;
However, on ARM
, the memory model is less strict, so the compiler might generate a DMB
(Data Memory Barrier) instruction for Release
or Acquire
.
Spin Lock
If a lock is only ever held for very brief moments and the threads locking it can run in parallel on different processor cores,
it might be better for the threads to repeatedly try to lock it without actually going to sleep.
Minimal Impl
impl SpinLock {
pub const fn new() -> Self {
Self { locked: AtomicBool::new(false) }
}
pub fn lock(&self) {
while self.locked.swap(true, Acquire) {
std::hint::spin_loop();
}
}
pub fn unlock(&self) {
self.locked.store(false, Release);
}
}
Hold value
pub struct SpinLock<T> {
locked: AtomicBool,
value: UnsafeCell<T>,
}
impl<T> SpinLock<T> {
pub fn new(value: T) -> Self {
Self {
locked: AtomicBool::new(false),
value: UnsafeCell::new(value),
}
}
// use this macro to suppress the clippy error
#[allow(clippy::mut_from_ref)]
pub fn lock(&self) -> &mut T {
while self.locked.swap(true, Acquire) {
std::hint::spin_loop();
}
unsafe { &mut *self.value.get() }
}
pub fn unlock(&self) {
self.locked.store(false, Release);
}
}
Use Guard to auto unlock
pub struct Guard<'a, T> {
lock: &'a SpinLock<T>,
}
impl<T> Deref for Guard<'_, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
unsafe { &*self.lock.value }
}
}
impl<T> DerefMut for Guard<'_, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { &mut *self.lock.value }
}
}
impl<T> Drop for Guard<'_, T> {
fn drop(&mut self) {
self.lock.unlock();
}
}
Now let’s change the lock
method to return a Guard
:
impl<T> SpinLock<T> {
pub fn lock(&self) -> Guard<T> {
while self.locked.swap(true, Acquire) {
std::hint::spin_loop();
}
Guard { lock: self }
}
}
Fence it!
Remember we’ve talked about using fence
to substitute load
or store
in the previous section Fence? There happens to be an example of building a SpinLock
with fence
in the rust doc of fence
itself:
pub struct Mutex {
flag: AtomicBool,
}
impl Mutex {
pub fn new() -> Mutex {
Mutex {
flag: AtomicBool::new(false),
}
}
pub fn lock(&self) {
// Wait until the old value is `false`.
while self
.flag
.compare_exchange_weak(false, true, Ordering::Relaxed, Ordering::Relaxed)
.is_err()
{}
// This fence synchronizes-with store in `unlock`.
fence(Ordering::Acquire);
}
pub fn unlock(&self) {
self.flag.store(false, Ordering::Release);
}
}
A little bit confused of the single fence
? Let’s break it down:
First, to see things clearly, we can cast the while
loop to a loop
:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self
.flag
.compare_exchange_weak(false, true, Ordering::Relaxed, Ordering::Relaxed)
.is_ok()
{
break;
}
}
// This fence synchronizes-with store in `unlock`.
fence(Ordering::Acquire);
}
Then, we replace compare_exchange_weak
with load
and store
, note that this is impossible in real code because split cas
into individual load
and store
will break the atomicity:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self.flag.load(Ordering::Relaxed) == false {
self.flag.store(true, Ordering::Relaxed);
break;
}
}
// This fence synchronizes-with store in `unlock`.
fence(Ordering::Acquire);
}
Do a little bit adjustment:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self.flag.load(Ordering::Relaxed) == false {
fence(Ordering::Acquire);
self.flag.store(true, Ordering::Relaxed);
break;
}
}
}
Now we can see that fence
after load
pattern! It’s equivalent to:
pub fn lock(&self) {
// Wait until the old value is `false`.
loop {
if self.flag.load(Ordering::Acquire) == false {
self.flag.store(true, Ordering::Relaxed);
break;
}
}
}
And that’s exactly how does the fence
synchronizes-with the store
in unlock
.
It’s a little bit verbose, but it’s a good way to understand how fence
works.
Chan
pub struct Channel<T> {
message: UnsafeCell<MaybeUninit<T>>,
ready: AtomicBool,
}
unsafe impl<T> Sync for Channel<T> where T: Send {}
pub struct Sender<'a, T> {
channel: &'a Channel<T>,
recv_thread: Thread,
}
pub struct Receiver<'a, T> {
channel: &'a Channel<T>,
_marker: std::marker::PhantomData<&'a T>,
}
impl<T> Channel<T> {
pub const fn new() -> Self {
Self {
message: UnsafeCell::new(MaybeUninit::uninit()),
ready: AtomicBool::new(false),
}
}
pub fn split(&mut self) -> (Sender<T>, Receiver<T>) {
*self = Self::new();
(
Sender {
channel: self,
recv_thread: thread::current(),
},
Receiver {
channel: self,
_marker: std::marker::PhantomData,
},
)
}
}
impl<T> Sender<'_, T> {
pub fn send(self, message: T) {
unsafe { (*self.channel.message.get()).write(message) };
self.channel.ready.store(true, Release);
self.recv_thread.unpark();
}
}
impl<T> Receiver<'_, T> {
pub fn is_ready(&self) -> bool {
self.channel.ready.load(Relaxed)
}
pub fn receive(self) -> T {
while !self.channel.ready.swap(false, Acquire) {
thread::park();
}
unsafe { (*self.channel.message.get()).assume_init_read() }
}
}
impl<T> Drop for Channel<T> {
fn drop(&mut self) {
if *self.ready.get_mut() {
unsafe { self.message.get_mut().assume_init_drop() }
}
}
}
Pay attention to these points:
- Use
MaybeUninit
to save memory. - Abstract
Sender
andReceiver
to limit the user’s access to the channel.(Think about theMutexGuard
inMutex
) - Use thread handle to park / unpark the receiver thread.
- Do not forget to drop the
MaybeUninit
due to the struct won’t drop the inner data automatically.
Arc
Minimal Impl
-
snippet 1:
struct ArcData<T> { ref_cnt: AtomicUsize, data: T, } pub struct Arc<T> { ptr: NonNull<ArcData<T>>, } unsafe impl<T: Send + Sync> Send for Arc<T> {} unsafe impl<T: Send + Sync> Sync for Arc<T> {}
NonNull
is almost same as*mut T
but non-null and [covariant]Arc<T>
should be shared between threads, soT
should beSync
.Arc<T>
could be dropped by other threads, soT
should beSend
.
-
snippet 2:
impl<T> Arc<T> { pub fn new(data: T) -> Arc<T> { Arc { ptr: NonNull::from(Box::leak(Box::new(ArcData { ref_count: AtomicUsize::new(1), data, }))), } } fn data(&self) -> &ArcData<T> { unsafe { self.ptr.as_ref() } } } impl<T> Deref for Arc<T> { type Target = T; fn deref(&self) -> &T { &self.data().data } }
- Use
Box::leak
to give up the exclusive ownership of aBox
allocation.
- Use
-
snippet 3:
impl<T> Clone for Arc<T> { fn clone(&self) -> Self { if self.data().ref_count.fetch_add(1, Relaxed) > usize::MAX / 2 { std::process::abort(); } Arc { ptr: self.ptr } } } impl<T> Drop for Arc<T> { fn drop(&mut self) { if self.data().ref_count.fetch_sub(1, Release) == 1 { fence(Acquire); unsafe { drop(Box::from_raw(self.ptr.as_ptr())); } } } }
fetch_add
andfetch_sub
are used to increment and decrement the reference count.fetch_add
isRelaxed
because we don’t need to guarantee the order of the reference count, but pay attention to the next line.fetch_sub
isRelease
and thenfence(Acquire)
to make sure no one is still accessing the data when we drop it.
We need to guarantee that all the previousfetch_sub
happens before the finalfetch_sub
.
In hence, we can useAcqRel
for that, however, only the final decrement needsAcquire
, so we useRelease
+fence(Acquire)
for efficiency.
Self-Reference
Futher more, if we want to handle ‘self-reference’ struct with Arc
, it might cause memory leak, because the Arc
will only be dropped when the reference count is zero while self-reference struct will never be zero:
use std::sync::{Arc, Mutex};
#[derive(Debug)]
struct MyStruct {
value: String,
self_ref: Option<Arc<Mutex<MyStruct>>>,
}
impl MyStruct {
fn new(value: String) -> Self {
MyStruct {
value,
self_ref: None,
}
}
fn set_reference(&mut self, other: Arc<Mutex<MyStruct>>) {
self.self_ref = Some(other);
}
}
impl Drop for MyStruct {
fn drop(&mut self) {
println!("MyStruct is dropped");
}
}
struct DetectDrop {}
impl Drop for DetectDrop {
fn drop(&mut self) {
println!("DetectDrop is dropped");
}
}
fn main() {
let a = Arc::new(Mutex::new(MyStruct::new("Hello".to_string())));
let _b = Arc::new(DetectDrop {});
{
let mut a_ref = a.lock().unwrap();
a_ref.set_reference(a.clone());
}
println!("Arc count: {}", Arc::strong_count(&a));
}
// OUTPUT:
// Arc count: 2
// DetectDrop is dropped
So, here comes the Weak
.
Weak
Weak
is a non-owning reference to the managed allocation, which means it won’t increase the reference count.
To achieve that, we can simply split the Arc
into Weak
and Arc
:
pub struct Arc<T> {
ptr: NonNull<ArcData<T>>,
}
unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}
pub struct Weak<T> {
ptr: NonNull<ArcData<T>>,
}
unsafe impl<T: Sync + Send> Send for Weak<T> {}
unsafe impl<T: Sync + Send> Sync for Weak<T> {}
struct ArcData<T> {
/// Number of `Arc`s.
data_ref_count: AtomicUsize,
/// Number of `Weak`s, plus one if there are any `Arc`s.
alloc_ref_count: AtomicUsize,
/// The data. Dropped if there are only weak pointers left.
data: UnsafeCell<ManuallyDrop<T>>,
}
We’re not going to copy the code from the book, let’s just summarize the key points:
- Using giant numbers as a limit is feasible, in the usage scenario of
Arc
, there’re no operating systems that can handleusize::MAX / 2
threads. - The inner data is still accessible even if all the
Arc
instances are dropped, because theWeak
is still holding theArcData
. However, nobody can access it outside of theArc
crate - since we only provideupgrade
method, and it’ll returnNone
. get_mut
returnsNone
unless theArc
is the only one holding the data.
Have a look at this example:
#[cfg(test)]
mod tests {
use std::thread::{self};
use super::*;
#[test]
fn test_arc() {
thread::scope(|s| {
let mut arc = Arc::new(42);
// only one reference, so we can get_mut
assert!(Arc::get_mut(&mut arc).is_some());
let weak1 = Arc::downgrade(&arc);
// more than one reference, so we cannot get_mut
assert!(Arc::get_mut(&mut arc).is_none());
let t = s.spawn(move || {
thread::park();
// we cannot upgrade again
assert!(weak1.upgrade().is_none());
// but the data is still there, it's just not accessible outside of the crate
let d = weak1.data();
println!("arc refs: {:?}", d.data_ref_count.load(Relaxed));
println!("weak refs: {:?}", d.alloc_ref_count.load(Relaxed));
println!("data: {:?}", unsafe { &*d.data.get() });
});
// wait until the Arc is dropped
s.spawn(move || {
drop(arc);
})
.join()
.unwrap();
t.thread().unpark();
})
}
}
Extra - Primitives
We’ve talked about UnsafeCell
in the first chapter Cell, RefCell, and we’ve seen more primitives in Chan and Arc. So it’s time to summarize them:
UnsafeCell
As we’ve mentioned before, UnsafeCell
is the core primitive for interior mutability, and it’s the only safe way to have mutable data behind a shared reference.
E.g:
fn cannot_compile() {
let x = UnsafeCell::new(1);
let y = &x as *const UnsafeCell<i32> as *const i32 as *mut i32;
unsafe {
*y = 2;
}
println!("{}", unsafe { *x.get() });
}
fn ok() {
let x = UnsafeCell::new(1);
unsafe {
let y = x.get();
*y = 2;
}
println!("{}", unsafe { *x.get() });
}
Note that the complex casting in the left code is copied from UnsafeCell::get()
, so the two snippets are actually the same.
However, the code on the left side will not compile, rustc will throw an error like this:
error: assigning to &T
is undefined behavior, consider using an UnsafeCell
Nonetheless, you can wrap the conversion in to a method of a struct to ‘make it work’:
struct MyUnsafeCell<T: ?Sized> {
value: T,
}
impl<T> MyUnsafeCell<T> {
fn new(v: T) -> Self {
MyUnsafeCell { value: v }
}
fn get(&self) -> *mut T {
self as *const MyUnsafeCell<T> as *const T as *mut T
}
}
pub fn ok() {
let x = MyUnsafeCell::new(1);
let y = x.get();
unsafe {
*y = 2;
}
println!("{}", x.value);
}
But this is not the right way, because:
Aside: lang item
There’re some ‘black magic’ in rust – if you’re trying to copy the code of some builtin types, such as UnsafeCell
or ManuallyDrop
, and expect it to work just like the std lib, you might get disappointed. (Like you cannot compile the code above)
It’s even worse when you’re doing the same way with ManuallyDrop
, because the compiler will not prevent you from writing those codes, so the unexpected behavior will happen at runtime and it’s hard to detect.
So why does the std lib work and yours does not? The answer is lang item
.
You can find the Attribute
above the UnsafeCell
: #[lang = "unsafe_cell"]
And this will make the compiler treat UnsafeCell
as a special type, and it will be handled differently from the normal types, what’s more, this ‘handle’ is pluggable.
Check this for more detail: lang item
So, what if we put this attribute on our struct?
#[lang = "unsafe_cell"]
struct MyUnsafeCell<T: ?Sized> {
value: T,
}
Bad news, it won’t work, because this attribute cannot be redefined:
error[E0152]: found duplicate lang item unsafe_cell
If you insist to do so, you’ll have to build a free-standing crate yourself.
Check this out: Error code E0152, #![no_std]
NonNull
To be short, as the doc says: “*mut T
but non-zero and covariant”.
So we can use it if we’re sure that the pointer is not null, and we can use it to bypass the safety check of *mut T
to get better performance.
non-zero *mut T
fn main() {
// addr = 0x0
let nptr: *mut i32 = ptr::null_mut();
println!("{} {}", nptr.is_null(), nptr.is_aligned()); // true true
// safe
let nnull = NonNull::new(nptr);
println!("{:?}", nnull); // None
// compile error in debug mode
// runtime error in release mode if you try to dereference it
let force_create = unsafe { NonNull::new_unchecked(nptr) };
let fptr = force_create.as_ptr();
println!("{:?}", fptr); // 0x0
println!("{}", unsafe { *fptr }); // runtime error
}
NonNull
will prevent you from creating a null NonNull
instance, but if you insist to do so by calling new_unchecked
, you’ll get a runtime error(if there’re no errors, it’s still a UB) when you try to dereference it.
Aside: assert_unsafe_precondition!
NonNull::new_unchecked
uses this macro to check the null pointer.
As we know, xxx_unchecked
is a common pattern in rust, and it’s always used to bypass the safety check to improve performance.
So why does NonNull
still check the null pointer? The secret is:
“The check is enabled at runtime if debug assertions are enabled when the caller is monomorphized. In const-eval/Miri checks implemented with this macro for language UB are always ignored.”
So, this check is only enabled in debug mode.
BTW, do you know how to enable/disable the debug assertions? There’re 3 ways:
- put
RUSTFLAGS="-C debug-assertions=on"
beforecargo build
- add
debug-assertions = true
in theCargo.toml
- by default,
cargo build --release
will disable the debug assertions
covariant
This is a quite complex and subtle concept, I recommend you to read this article in Rustonomicon: 🔗Subtyping and Variance.
Anyway, let’s have a quick look of Variance
in rust:
fn debug<'a>(a: &'a str, b: &'a str) {
println!("a = {a:?} b = {b:?}");
}
We can see that a
and b
have the same lifetime 'a
, but does the debug
function really requires the same lifetime for a
and b
?
The answer is NO:
fn main() {
let hello: &'static str = "hello";
{
let world = String::from("world");
let world = &world; // 'world has a shorter lifetime than 'static
debug(hello, world);
}
}
Let’s break it down without complicated concepts:
- assume there’re 2 lifetimes:
'long
and'short
'long
’s code scope completely covers'short
’s code scope
Then we can say that 'long
can be downgraded to 'short
.
In hence, debug(hello, world)
is:
debug<'world>('world hello, 'world world)
instead of
debug<'static>('static hello, 'static world)
.
Now let’s find the concepts back:
'static
is a SubType of'world
'world
is a SuperType of'static
It’s not hard to understand the naming - we can always cast a child class into a parent class in OO languages like Java.
So we can always cast a SubType into a SuperType in Rust.
Look back to Variance
:
‘Variance’ is another concept for describing the relationship between lifetimes. And it has 3 sub-concepts:
- Covariant: if
'a
is a subtype of'b
, thenFoo<'a>
is a subtype ofFoo<'b>
. - Contravariant: if
'a
is a subtype of'b
, thenFoo<'b>
is a subtype ofFoo<'a>
. - Invariant: if
'a
is a subtype of'b
, thenFoo<'a>
andFoo<'b>
are not related to the precondition.
Let’s stop here, we can find more details in the book Rustonomicon.
ManuallyDrop
A wrapper to inhibit the compiler from automatically calling T
’s destructor and it’s 0-cost.
Take a look at the definition:
#[stable(feature = "manually_drop", since = "1.20.0")]
#[lang = "manually_drop"]
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct ManuallyDrop<T: ?Sized> {
value: T,
}
This is a lang item
and it’s repr(transparent)
, so it’s just a wrapper of T
, and the implementation is a plugin of the compiler.
Using ManuallyDrop
is quite simple:
struct DetectDrop<T: Debug> {
value: T,
}
impl<T: Debug> Drop for DetectDrop<T> {
fn drop(&mut self) {
println!("Dropping DetectDrop with value: {:?}", self.value);
}
}
fn main() {
// immutable, use into_inner to take the value out and drop it
let x = ManuallyDrop::new(DetectDrop { value: 52 });
{
let _ = ManuallyDrop::into_inner(x);
}
// mutable, use `drop` to drop the value
let mut y = ManuallyDrop::new(DetectDrop { value: 99 });
unsafe {
ManuallyDrop::drop(&mut y);
}
}
MaybeUninit
A wrapper to represent an value which is probably uninitialized.
#[stable(feature = "maybe_uninit", since = "1.36.0")]
#[lang = "maybe_uninit"]
#[derive(Copy)]
#[repr(transparent)]
pub union MaybeUninit<T> {
uninit: (),
value: ManuallyDrop<T>,
}
As we can see, it’s also a lang item
and repr(transparent)
, meanwhile it’s a wrapper of ManuallyDrop<T>
.
The design pattern is easy to understand:
- when you create an
uninit
value, it’s just a()
and it’s safe to drop - but when you assign a value to it, it’ll become a
ManuallyDrop<T>
so it’s unsafe to drop
So what’s the purpose of MaybeUninit
? Especially when we have ManuallyDrop
?
A typical use is lazy initializing:
fn main() {
let mut arr: [MaybeUninit<i32>; 10000] = unsafe { MaybeUninit::uninit().assume_init() };
for i in 0..100 {
arr[i] = MaybeUninit::new(i as i32);
}
let mut sum = 0;
for i in 0..100 {
sum += unsafe { arr[i].assume_init() };
}
println!("{}", sum);
}
The cost of the array itself cannot be reduced, but the cost of element is optimized.
The example above is quite simple, what if we replace i32
with a complex struct? – We have to call T::new()
for each element, and those ‘zero’ values don’t take zero cost.
You may noticed that we can replace MaybeUninit
with Option
in this example, so what’s the difference?
Option
is aenum
and it’s a fat pointer, whileMaybeUninit
is zero-cost.Option
can tell you whether the value is initialized, butMaybeUninit
cannot.
Which indicates that you as a programmer, are responsible for keeping track of it and never reading it if it’s uninitialized
PhantomData
This is a marker for static analysis, and it’s used to tell the compiler that you’re using a type parameter in a struct for a reason other than storing a value of that type.
For exmaple, sometimes we want to correlate the lifetime of a struct with the lifetime of a reference in it:
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
}
Since the lifetime specifier is not used inside the struct, the lifetime is actually boundless, we can fix that issue with PhantomData
like this:
use std::marker;
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: marker::PhantomData<&'a T>,
}
Another typical usage of PhantomData
is make a raw pointer own the data:
use std::marker:
struct Vec<T> {
data: *const T,
len: usize,
cap: usize,
_marker: marker::PhantomData<T>,
}
Without the _marker
, the Vec
will not own the data, then the data will not be dropped when the Vec
is dropped.
Processors
Quick view of differences between RISC and CISC
Rust:
fn add_10(n: &mut i32) {
*n += 10;
}
RISC(ARM):
add_10:
ldr w8, [x0] // load
add w8, w8, #10 // modify
str w8, [x0] // store
ret
CISC(x86-64, Intel, AMD):
add_10:
add dword ptr [rdi], 10 // all in one
ret
Simple Load / Store
For both of the atomic and non-atomic rust codes, the compiled asm codes are the same:
pub fn store(x: &mut i32) {
*x = 0;
}
pub fn atomic_store(x: &AtomicI32) {
x.store(0, Relaxed);
}
x86_store:
mov dword ptr [rdi], 0
ret
arm_store:
str wzr, [x0]
ret
Because mov
and str
are already atomic.
x86 Lock Prefix
pub fn add(x: &AtomicI32) -> i32 {
x.fetch_add(10, Relaxed)
}
add:
mov eax, 10
lock xadd dword ptr [rdi], eax
ret
Note that not all the instructions can be prefixed with lock
.
lock
originally caused the processor to temporarily block all other cores from accessing memory for the duration of the instruction, but newer processors won’t stop other cores from operating on unrelated memory.
x86 compare and exchange
fetch-and-modify
operation can be implemented as a compare-and-exchange
loop:
pub fn a(x: &AtomicI32) -> i32 {
x.fetch_or(10, Relaxed)
}
// a is same as b
pub fn b(x: &AtomicI32) -> i32 {
let mut current = x.load(Relaxed);
loop {
let new = current | 10;
match x.compare_exchange(current, new, Relaxed, Relaxed) {
Ok(v) => return v,
Err(v) => current = v,
}
}
}
a:
mov eax, dword ptr [rdi]
.L1: // loop until cas success
mov ecx, eax
or ecx, 10
// note that [cmpxchg dest, src] compares the dest with eax
lock cmpxchg dword ptr [rdi], ecx
jne .L1
ret
On x86-64, there is no difference between compare_exchange and compare_exchange_weak. Both compile down to a lock cmpxchg instruction.