Comprehensive For in C++

Switching back and forth between Scala and C++ tends to shuffle ideas in my brain. Some ideas from Scala and the functional world are so good that is just a pity that can’t be applied directly and simply in C++.

If you carefully look into idiomatic C code for error handling, it is easy to find a lousy approach even in industrial code. I’ve seen a lot of code where not every return value was examined for an error, or not everything was thoroughly validated, or some code was executed even when it was not needed because an error occurred but was too cumbersome to provide the escape path.

In C++ things are a bit better, thanks to the exception mechanism, but exceptions are such a pain in the neck to get right and to handle properly that error management could soon turn into a nightmare.

Error handling in C and C++ is so boring and problematic that the examples in most textbooks just skip it.

(featured image by Nadine Shaabana)

Without going too deep in the functional programming realm, in Scala, there is a handy concept for error handling (well, I’m oversimplifying, but just to give you the idea): the Either class.

Either is a way to express either a value or an error in the same type. A function returning an Either is capable of returning the complete result if everything went right or an arbitrarily complex description of what went wrong.

As I wrote in this blog (and in the code in my flic library) you can code an Either template in C++ with pretty much the same functionalities of the Scala version. Writing lambdas in C++ is more cumbersome, but you can do it.

The most C++ idiomatic code that you can use to deal with an Either is something like:

void f( Either const& e )
{
  if( e.isRight() )
  {
    success( e.getRight() );
  }
  else
  {
    error( e.getLeft() );
  }
}

I agree this doesn’t look very exciting. Besides, it doesn’t combine well with other Eithers. Suppose you need to parse three strings containing year, month, and day of the month e combining them to produce a date.
Let’s see first in traditional C –

Error parseDate( char const* year, char const* month, char const* day, Date* result )
{
  char const* end;
  unsigned nYear = strtoul( year, &end, 10 );
  if( end != year+strlen( year ))
  {
    return Error_INVALID_NUMBER_YEAR;
  }
  unsigned nMonth = strtoul( month, &end, 10 );
  if( end != month+strlen( month ))
  {
    return Error_INVALID_NUMBER_MONTH;
  } 
  unsigned nDay = strtoul( day, &end, 10 );
  if( end != day+strlen( day ))
  {
    return Error_INVALID_NUMBER_DAY;
  }

  return makeDate( nYear, nMonth, nDay, date );
}

This is simple and straightforward, a bit verbose, maybe, but simple. There are two things to note. First, this is a happy case where we can quite the function without the need for freeing allocated resources (otherwise the function would be much more complex).

Second, error reporting is a bit vague, you don’t know why string to number conversion failed, but just that failed. Strings may have been empty, containing invalid characters, holding numbers too large to be represented in the integral type provided. Whatever, you’ll always get the same error.

In idiomatic C++, the same function would be written as follows.

Date parseDate( std::string const& year, std::string const& month, std::string const& day )
{
  unsigned uYear = std::stoul( year );
  unsigned uMonth = std::stoul( month );
  unsigned uDay = std::stoul( day );
  return Date( uYear, uMonth, uDay );
}

Here it looks like the error management is not present at all. That’s partly true, but it is sort of hiding the dirt under the carpet. Each one of the functions called in parseDate may throw an exception disrupting the execution flow, but what is worse is that once the exception is thrown, if it is not caught early, it brings no correlation between the error and what caused it.

I mean, if there is a catch around the call to parseDate (and each call to parseDate), then you are sure that parseDate found an error, but if your catch is farther away in the call stack, then you are no longer sure whether the exception has been thrown by parseDate or by another function.

Enter Either. Either is not C++ idiomatic, but it is inspired by functional code. You can define an Either as a std::variant with two slots, namely left and right, used to hold either a value (right) or an error (left).

Either per se is not that’s interesting, but it becomes much more interesting when you look at its transformation methods – map and flatMap. In C++ parlance a map is a std::transform – you keep the container but replace the contained type with something produced with the original value (say, transform a vector of strings into a vector of ints containing the length of the strings in the first vector).

This mechanism allows you to chain together operations that may result in an error. Let’s rephrase the above code with Either and maps:

template<typename T>
Either<Error,T> to<T>( std::string const& s ) noexcept;

Either<Error,Date> Date::from( unsinged year, unsigned month, unsigned day ) noexcept;

Either<Error,Date> parseDate( std::string const& year, std::string const& month, std::string const& day )
{
  return to<unsigned>(year).flatMap(
    [&]( auto&& uYear ){ to<unsigned>(month).flatMap(
        [&]( auto&& uMonth ){ day.flatMap(
            [&]( auto&& uDay ){  Date::from( uYear, uMonth, uDay ); }
          ); }
      ); }
  );
}

First, we need to insert this fragment in a framework that supports Either return value, so I introduced the template to<> that parses a string into another type or produces an error about what prevented to<> from properly parsing.

Then I have to create a factory for Date so to avoid the lack of return value from the class constructor.

These are really no issues. The code is pretty robust because each error is caught with no specific attention to be paid to error management. The error and the result flow together, like inside a Schrödinger box, until you decide to look inside. The magic of monads, prevents further operations to be performed when the error is detected.

The drawback is that syntax is extremely bitter… in the dire need for some sugar to sweeten it. That’s why in Scala there something named for comprehension that looks like:

def parseDate( year: String, month: String, day: String ) : Either[Error,Date] = {
  for {
    nYear <- toInt( year )
    nMonth <- toInt( month )
    nDay <- toInt( day )
    date <- Date::from( year, month, day )
  }
  yield {
    date
  }
}

(this is mostly true – toInt function does not exists in Scala, but it is trivial to implement them – as it is in any other language).

Now this way of writing is very clear and easy to reason about. Only the logical operation in the happy path is described. In fact, there is just one minor annoyance, errors from toInt are not converted into errors specific for the day, month, or year.

As a side note, the same construct can be used to sweep through nested loops, but this is meat for another post.

I decided I need some syntactic sugar like this for my C++ code.

I tried to employ an operator overload, but I didn’t succeed, in fact, the Scala <- is a ternary operator – it has a destination variable, a source expression, and a lambda function composed by what follows the line. The inner for is a simple map and it must be produced as such to avoid an Either of Either (that’s why in Scala, you have date alone in the result).

Also on the left of <- operator, there is the declaration of a name, rather than an operand. I found no way to allow for this semantic without resorting to preprocessor macros.

On the other hand, preprocessor macros are very rudimental and cannot relate to each other or be applied recursively. So I ruled out having two macros for for and yield (even if other people have tried this approach with a solution similar to my C RAII).

Eventually, I resorted to a single do-everything macro, all arguments but the last one are alternating variable name and monad to flatMap, with the last one being the yield expression. In other words, I want to be able to write something like:

Either<Error,Date> parseDate( std::string const& year, std::string const& month, std::string const& day )
{
  return FOR_COMPREHENSION(
    nYear, to<unsigned>( year ),
    nMonth, to<unsigned>( month ),
    nDay, to<unsigned>( day ),
    date, Date::from( year, month, day ),
    date 
  );
}

Easier said than done since I had to handle a variadic macro that should recursively call itself with less and less argument until all arguments but the last is left.

I started giving up the chance to automatically invoke the right macro given the number of argument, and prepared some macros with the number of arguments –

#define MONADIC_FLATMAP( x__, mX__, E__ )    
    (mX__).flatMap( [&]( auto&& x__ ){ return E__; } )

#define MONADIC_MAP( x__, mX__, E__ )    
    (mX__).map( [&]( auto&& x__ ){ return E__; } )

#define FOR_COMPREHENSION1( x1__, mX1__, E__ ) 
    MONADIC_MAP( x1__, mX1__, E__ )

#define FOR_COMPREHENSION2(  x1__, mX1__, x2__, mX2__, E__ ) 
    FLATMAP( x1__, mX1__, FOR_COMPREHENSION1( x2__, mX2__, E__ ))

#define FOR_COMPREHENSION3(  x1__, mX1__, x2__, mX2__, x3__, mX3__, E__ ) 
    MONADIC_FLATMAP( x1__, mX1__, FOR_COMPREHENSION2( x2__, mX2__, x3__, mX3__, E__ ))

#define FOR_COMPREHENSION4(  x1__, mX1__, x2__, mX2__, x3__, mX3__, x4__, mX4__, E__ ) 
    MONADIC_FLATMAP( x1__, mX1__, FOR_COMPREHENSION3( x2__, mX2__, x3__, mX3__, x4, mX4__, E__ ))

They are a bit boring to write, but once you get the pattern, you can write the macro body invoking the macro for the lesser level.

Still, I wasn’t very happing, so I searched how to automatically pick up the right macro. After some internet milking, I found an interesting technique for this purpose. Let’s say that we are satisfied with at most 10 variables, so we need to be able to handle a macro with 10×2+1 arguments. But not every number of arguments works, in fact, the number should be nx2+1 with n between 1 and 10 (both included).

Let’s write a macro that selects its (10×2+1)th arguments like this:

#define ELEVENTH_TRIPLET_ARGUMENT(   
    a00, a01,   
    a10, a11,   
    a20, a21,   
    a30, a31,   
    a40, a41,   
    a50, a51,   
    a60, a61,   
    a70, a71,   
    a80, a81,   
    a90, a91, a92,  
    a100, ...)   a100

Now we have to write a macro so that when given the parameters for the for comprehension the nx2+1 argument will be the name of the for comprehension macro –

#define FOR_COMPREHENSION_NAME( ... )  
    ELEVENTH_TRIPLET_ARGUMENT(   
        __VA_ARGS__,                     
        FOR_COMPREHENSION10, Error, 
        FOR_COMPREHENSION9, Error,  
        FOR_COMPREHENSION8, Error,  
        FOR_COMPREHENSION7, Error,  
        FOR_COMPREHENSION6, Error,  
        FOR_COMPREHENSION5, Error,  
        FOR_COMPREHENSION4, Error,  
        FOR_COMPREHENSION3, Error,  
        FOR_COMPREHENSION2, Error,  
        FOR_COMPREHENSION1, Error, Error  
    )

Error is just a placeholder that is likely to cause the compiler to garble up if the wrong number of arguments is used.

At this point, everything is in place to write my FOR_COMPREHENSION macro –

#define FOR_COMPREHENSION( ... )   
    FOR_COMPREHENSION_NAME( __VA_ARGS__ ) ( __VA_ARGS__ )

The nice part is that you can use it for everything that provides a flatMap method, be it Either, Option, Future, or any container.

In truth, there are a couple of limitations. First, the most obvious, you can have more than 10 variables. This is a quite reasonable limitation since larger comprehension can (and should) be reduced via refactoring.

The other limitation is that Scala for is actually more expressive, allowing for filtering items by embedding if statements. I have not explored if and how this could be possible.

The last limitation is bound to the preprocessor nature that originated well before C++ was even a blink in the eye of Bjarne. For this reason < and > are considered operators and not parts of the template syntax. Therefore if you write:

MACRO( f<A,B>(3) );

Where MACRO is … well, a preprocessor macro, the preprocessor interpreter looks at macro arguments and finds a comma outside any round parenthesis enclosing. “Ehy!”, he exclaims in his simple and primitive mind: “There’s a comma, these must be two arguments: f<A and B>(3)“. This means that sometimes you need to enclose your macro argument with an extra pair of () just to avoid erroneous interpretations of the argument.

On a different level are my closing thoughts. More and more I feel the need for sweetening C++ syntax. It is not that C++ can’t handle functional paradigm and functional concepts, it is that is really programmer expensive to work in this way.

Excessive typing is a problem not because of typing time, but because the more you write, the more bugs you write, the longer it takes to read and understand.

Consider lambda syntax for C++; the lack of language (and library) support for functional concepts such as Option, Either, map, flatMap; the lack of signature-based function types; the annoyance of marking every method with noexcept; the lack of algebraic data types and pattern matching. You can overcome most of these shortcomings, but it is tedious, frustrating, and error-prone, not different from writing OOP code in C.

When the need for syntactic sweetener is so strong, it means that either the language is becoming obsolete, or the language is not suitable for the task. It is quite some time that rust looks promising for system software development and its wider adoption is possibly limited only by the lack of chip vendor tools (compilers and libraries). Maybe the time has cometh to give rust a chance.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.