A stack-less Rust coroutine library under 100 LoC

Posted January 25, 2020 ‐ 4 min read

As of stable Rust 1.39.0, it is possible to implement a very basic and safe coroutine library using Rust's async/await support, and in under 100 lines of code. The implementation depends solely on std and is stack-less (meaning, not depending on a separate CPU architecture stack).

A very basic simple coroutine library contains only an event-less 'yield' primitive, which stops execution of the current coroutine so that other coroutines can run. This is the kind of library that I chose to demonstrate in this post to provide the most concise example.

Yielder

To the coroutine we pass a Fib struct that only contains a simple binary state. This Fib struct has a waiter method that creates a Future that the coroutine can use in order to be awaited upon.

use std::future::Future;
use std::pin::Pin;
use std::task::{Poll, Context};

enum State {
    Halted,
    Running,
}

struct Fib {
    state: State,
}

impl Fib {
    fn waiter<'a>(&'a mut self) -> Waiter<'a> {
        Waiter { fib: self }
    }
}

struct Waiter<'a> {
    fib: &'a mut Fib,
}

impl<'a> Future for Waiter<'a> {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
        match self.fib.state {
            State::Halted => {
                self.fib.state = State::Running;
                Poll::Ready(())
            }
            State::Running => {
                self.fib.state = State::Halted;
                Poll::Pending
            }
        }
    }
}

Executor

Our executor keeps a vector of uncompleted futures, where the state of each future is located on the heap. As a very basic executor, it only supports adding futures to it before actual execution takes place and not afterward. The push method adds a closure to the list of futures, and the run method performs interlaced execution of futures until all of them complete.

use std::collections::VecDeque;

struct Executor {
    fibs: VecDeque<Pin<Box<dyn Future<Output=()>>>>,
}

impl Executor {
    fn new() -> Self {
        Executor {
            fibs: VecDeque::new(),
        }
    }

    fn push<C, F>(&mut self, closure: C)
    where
        F: Future<Output=()> + 'static,
        C: FnOnce(Fib) -> F,
    {
        let fib = Fib { state: State::Running };
        self.fibs.push_back(Box::pin(closure(fib)));
    }

    fn run(&mut self) {
        let waker = waker::create();
        let mut context = Context::from_waker(&waker);

        while let Some(mut fib) = self.fibs.pop_front() {
            match fib.as_mut().poll(&mut context) {
                Poll::Pending => {
                    self.fibs.push_back(fib);
                },
                Poll::Ready(()) => {},
            }
        }
    }
}

Null Waker

For the executor implementation above, we need a null Waker, similar to the one used in genawaiter (link).

use std::task::{RawWaker, RawWakerVTable, Waker},

pub fn create() -> Waker {
    // Safety: The waker points to a vtable with functions that do nothing. Doing
    // nothing is memory-safe.
    unsafe { Waker::from_raw(RAW_WAKER) }
}

const RAW_WAKER: RawWaker = RawWaker::new(std::ptr::null(), &VTABLE);
const VTABLE: RawWakerVTable = RawWakerVTable::new(clone, wake, wake_by_ref, drop);

unsafe fn clone(_: *const ()) -> RawWaker { RAW_WAKER }
unsafe fn wake(_: *const ()) { }
unsafe fn wake_by_ref(_: *const ()) { }
unsafe fn drop(_: *const ()) { }

Giving it a go

We can test the library using a program such as the following:

pub fn main() {
    let mut exec = Executor::new();

    for instance in 1..=3 {
        exec.push(move |mut fib| async move {
            println!("{} A", instance);
            fib.waiter().await;
            println!("{} B", instance);
            fib.waiter().await;
            println!("{} C", instance);
            fib.waiter().await;
            println!("{} D", instance);
        });
    }

    println!("Running");
    exec.run();
    println!("Done");
}

The output is as follows:

Running
1 A
2 A
3 A
1 B
2 B
3 B
1 C
2 C
3 C
1 D
2 D
3 D
Done

Performance

Timing the following program compiled with lto = true, I have seen that it takes about 5 nanoseconds for each iteration of the internal loop, on an Intel i7-7820X CPU.

pub fn bench() {
    let mut exec = Executor::new();

    for _ in 1..=2 {
        exec.push(move |mut fib| async move {
            for _ in 0..100_000_000 {
                fib.waiter().await;
            }
        });
    }

    println!("Running");
    exec.run();
    println!("Done");
}

End notes

One of the nice things about the async/await support in the Rust compiler is that it does not depend on any specific run-time. Thus, if you commit to certain run-time, you are free to implement your own executor.

Independency of run-time has its downsides. For example, the library presented here is not compatible with other run-times such as async-std. And in fact, the implementation violates the interface intended for the Future's poll function by assuming that the Future will always be Ready after it was Pending.

Combined uses of several run-times in a single program is possible but requires extra care (see Reddit discussion).