I really enjoyed this talk and not only because the speaker is an old friend on mine. I met Alberto back in the Ubisoft days and he was and still is one of the best C++ programmers I know and for sure the only friend who submitted a proposal to the standard committee that got approved. On the other hand I have no friends who submitted a proposal to the standard committee that was refused. So all my friends who submitted… well I’m digressing.
This talk was in Italian, that is welcome, but… weird. Usually all talks are in English even for Italian conference. Anyway here’s my summary, as usual errors and mistakes are mine. You can find Alberto’s slides here.
Coming Soon to a Compiler Near You – Coroutines
Alberto is a C++ enthusiast programmer since 1990. In this hour he’s going to talk about coroutines, their history and their relation with C++ and C++ Technical Specifications (TS).
A suspend and resume point is a point in the instruction sequence where the execution can be suspended, other stuff is executed and then the execution is resumed from there.
The most common form of suspend and resume point is a subroutine call. The execution is transferred from the caller to the callee and when the callee is done, the execution is transferred back to the caller.
Coroutines are a generalized form of suspension and resume points, that allow you to manage where you want to suspend, transfer control and resume, without be constrained in the caller/callee format.
When is this pattern useful?
Scenario #1 – generators
Generators are sort of functions that produce sequence of items. Usually item number n in a sequence depends in from some state that evolves from an item to the next one. So could move the state into an explicit data type and pass it back and forth to the generator, or use a coroutine so that the generator keep its state as it please.
Subroutines are based on two conceptual instructions – call and return, coroutines are based on two conceptual instructions as well: await and yield.
await transfer the control to another coroutine suspending the execution of the code where the await is.
yield gives the control back to the former coroutine. The difference between return and yield is that return destroys the function state when proceeding back to the caller; while yield preserves the current state and can be resumed later.
The execution pattern looks like this:
where the bar (|) represents active execution, while colon (:) represents the suspended state where the coroutine waits for resuming execution.
Scenario#2 – Cooperative Multitasking
When programming videogames, a common need is to update a large number of objects frequently – possibly once per frame. In order to save resources, always scarce in these applications, programmers traditionally avoid using threads and calls each object’s tick() function to give a timeslice of computation to make that object evolve.
The execution can be summed up in the picture below.
At the first execution the scheduler gives computation time to the first actor that does something and then passes the execution to the second. When done it yields control to the scheduler.
Scheduler Actor1 Actor2
| : :
: | :
: : |
At the next frame, again the Scheduler gives control to Actor 1 that resumes from the point were it arrived.
This approach is much easier with respect to the finite state machine (FSM). In a FSM you have to explicitly manage the state, so that when you give the control back, the state is saved and then restored when executed again.
With coroutines, you leave the state where it is and you find it again when resuming the execution.
The tick() approach is fine, but it is awkward when you have to implement an action that spans over several ticks().
[NdM. I spent a lot of time thinking about this, so let me add some details here. Suppose you have a character and you want to move it, one pixel at time from point (x0,y0) to (x1,y0). Using FSM, you first check if you are arrived, then add one to horizontal position and return. With coroutines, you just write a for loop with a yield in the middle.]
Each actor may suspend itself when waiting for some event (typically next frame start or some action timeout) or for communicating with other actors.
Regardless of the approach, be it FSM or coroutines, it is mandatory that no actor keeps the CPU too long, otherwise all the actors freeze.
Scenario #3 – asynchronous calls
Asynchronous calls allows the caller to proceed with execution while waiting for a long operation to complete. Operations involving I/O, are ideal candidates. There are many techniques for dealing with async – callbacks, futures, continuations. Coroutines do not save you from one of such techniques, but greatly simplify their use.
Scenario #4 – Coroutine termination
You can stop the execution of a coroutine, just by not giving back control to it. In this way the suspension point becomes a termination point. This technique may be used to cancel asynchronous operation or to simplify some functional patterns.
In order to resume the execution of some code after a suspension the execution context is needed. When you call a subroutine, execution context is saved on the stack. This is very convenient since the context is automatically saved, restored and destroyed according to the execution. You go down the call chain and the stack grows, when returning through the call chain, the stack shrinks.
Coroutines do not work the same way, because you must be sure that the execution context is never destroyed on suspension. There are many approaches to implement context saving, them all split in two main categories – stackful and stackless.
Stackful employs the stack in a smart way, while stackless stores the context elsewhere. Unfortunately it is easy to start flame wars on this subject.
C++ Past, Present and Future
The first proposal to add coroutines to C++ has been presented by Microsoft in 2012. They were called resumable functions, but the semantic is the same. Next year boost.coroutine appears. This is a library only approach to coroutines that is presented to the ISO committee for inclusion in C++ standard. This doesn’t reach a consensus and it is not accepted.
Many different proposals appear in the next years. The reason is that the concept is quite wide and many implementation approaches exist to comply with or exploit different application domains (embedded systems, with or without excetions, distributed …).
In 2014 Gor Nishanov (Microsoft) presents a revision of the aforementioned “resumable functions” using a stackless approach. Also he modifies Visual Studio (he is a Visual Studio developer) to implement his proposal.
C++14 committee decides to create a coroutine TS in order to gather feedbacks. Relating to this choice, Stroustrup commented: “I am disappointed that stackless coroutines are being put into a TS rather than directly into the standard itself. I think they are ready and important for a few critical use cases (pipeline and genreatores).”
This TS was published in July 2017, too late to be included in C++17 standard.
Gor Nishanov doesn’t give up and presents an implementation for the clang compiler.
Now the committee starts working on C++20. The path seems straightforward – there is a solid proposal, there are two major implementations – what can go wrong?
In the first votes in May and June 2018 the proposal keeps being refused. Mainly because Google first presented some criticism over Gor’s work, then presented a completely different alternative proposal: Core Coroutines.
Alberto considers Google’s proposal very late. Anyway a number of changes can be made to the TS to improve Gor’s coroutines and to integrate good ideas from Core Coroutines.
Next standard committee will be held in November 2018 if the vote on coroutines will be negative yet another time, then this feature won’t make it into C++20 and will be re-evaluated again for C++23.
TS for coutoutines is not always easy to read. Extension Point concept plays an important role in the TS. To better understand this concept let’s dissect the range for statement introduced in C++11:
for( type value : range )
The compiler removes the syntactic sugar and turns the above code into:
auto&& __range = range;
auto __begin = begin(__range);
auto __end = end( __range );
for ( ; __begin != __end ; ++__begin )
type value = * __begin;
The extension points are marked with red. Those begin and end are not compiler built-ins, but they are call to functions provided by the programmer. The code works as long the programmer provides suitable begin and end functions. So extension points are a way for the language to provide programmer-defined extensions, so that programmer defined data and functions can be seamlessly integrated in the language constructs.
To get maximum flexibility, coroutines require 15 extension points. If Google requests are accepted then extension points count could raise to 18.
Luckily for programmers, the complexity is mostly contained in the library – using coroutines does not require such complexity.
Let’s start with a generator. Everyone knows about Fibonacci and his Series. A generator for Fibonacci may look like this:
int i0 =1;
int i1 = 1;
int i2 = i0+i1;
i0 = i1;
i1 = i2;
First you will notice that there is no end-condition. The reason is that when the caller is done, it will call the generator no longer and the execution will stop forever.
Next, the return type is specialized for the generator kind of coroutine and more precisely as a generator of ints.
The first time the coroutine is executed it starts from the beginning, initializing i0 and i1. When the co_yield instruction is reached, the coroutine is suspended and the value of i0 is returned to the caller. When the caller calls the generator again, the execution resumes from co_yield, so that the code computes the next terms for the series and loops back to co_yield, returning the control to the caller with the next value in the sequence.
Async Network Input
Asynchronous I/O is another domain where coroutines would greatly simplify the code. Let’s consider a coroutine to receive data from a network connection:
task<vector<byte>> read_async(Tcp::addr_t address )
auto connection = co_await Tcp::Connect(addressa);
auto byetesRead = co_await connection.read( buf, sizeof(buf));
if( bytesRead == 0 ) break;
This function gives the control back to the caller (or to a scheduler) when waiting for network data (Tcp::Connect(addressa) and connection.read( buf, sizeof(buf))). Eventually data is returned via co_return .
So, what is a coroutine? A coroutine is a function that contains one of the following three keywords:
(the co_ prefix is needed to avoid conflicts with existing code base). As you may have noted, function signature has no indication or marks whether the function be a coroutine or not.
TS coroutine implementation is based on the promise object concept. The name has been chosen because it is very close to the std::promise from <future> standard library.
When a coroutine is detected the compiler first determines the type of the promise from the function signature. For example if the coroutine function is:
my_coroutine coro(my_data data )
Coroutine type will be:
More precisely, if R is the function return type and P1, P2…Pn are the function parameters, then the trait
Is a template that the programmer need to specialize in order to let the compiler do the proper magic. The trait may provide a number of methods that will be used:
- execution context storage (heap/stack)
- suspension points before and/or after the functions (sometimes it can be useful that the function does not start immediately).
- resulting object creation.
- mode to return values to the caller
- exception handling for exceptions that can escape from the coroutine body.
By default execution context is saved using a new statement. It is possible to redefine the new operator for the promise so that different storage location can be used. This is one of the criticized points since sometimes (e.g. for generators) it is convenient to use the stack to save the context.
Some cases are handled by the Heap Allocation eLision Optimization (HALO). But this is an optimization that is not under user control.
According to the promise selected at compile time, when a coroutine is invoked:
- memory to save the execution context is allocated. Within this memory an instance of the promise is constructed;
- The promise produces the return value that will be returned to the caller.
- The coroutine is executed up to the suspension point.
By using the returned object the caller may manipulate the coroutine execution and retrieve values from it.
Il compilatore sceglie il tipo della promise.
Control transfer is managed by the two expressions:
In the first case, expr must provide an awaitable type object. The expression is passed to the promise that has a chance to transform the type on which co_await is invoked upon.
If e is an awaitable object, then
if( e.await_ready() == false )
// suspend the coroutine and select another one for execution
coroutine_handle<> hNew = e.await_suspend(hCurrent);
// when the coroutine resume, it will start from here
result = e.await_resume()
Finally std::coroutine_handle<> is a template to manage the low level stuff of coroutines. Using the type erasure technique this template can handle any coroutine type.
Summing it up
Coroutine TS is complex because of the high number of extension points. But this complexity mainly impacts the library writers. TS is solid and versatile.
As of today there are two comform implementations.
Although there are still some hope to have coroutines in C++20, their form could change. If they will be in C++23 they could be completely different.