A Type with Character

So, you know everything about C++ integral types, don’t you?

I thought I did, until when I enabled clang-tidy on my project. It all started with the rather innocent-looking warning:

warning: use of a signed integer operand with a binary bitwise operator

It looked somewhat less innocent when, examining the line, I saw no evidence of signs.

But, let’s start from the comfort zone (at least, my comfort zone):

unsigned x = 32;

The type of ~x (bitwise negation) is still unsigned. No surprise here, really obvious. The diligent programmer finds that the data fits in a smaller integer and writes:

   uint8_t s = 42;

Can you guess the type of ~s? Give it a try, really. Ready? Well, the type of ~s is… int. What? A quick check of other expressions involving uint8_t yields the same … fascinating result. Apparently these expressions are all converted into int.

In other words (and with a bit of syntax bending) uint8_t+uint8_t -> int, uint8_t<<uint8_t -> int, uint8_t+1 -> int. Let me rephrase that, in every expression an uint8_t type is converted to int.

Time for some Internet Duckduckgoing :-).

Back to our uint8_t (that is nothing but an unsigned char in disguise). When a char value, be it signed, unsigned or plain is encountered in an expression (in C++ standard jargon, a prvalue) it is promoted to int pretty much on every common CPU. On exotic architectures, char and int could have the same size, so int could not hold every possible value of char and therefore it is turned into an unsigned. From a strictly rigorous point of view, the signedness of the type of uint8_t in expression is machine-dependent. Keep this in mind if you aim to write really portable code 😉 (*)

You can find a very good (and somewhat frightening) explanation here.

But I’d like to add something beyond the standard technicalities and look at the reasons for why this is like this and what we can take home.

First, it is important to fix that in C (and C++ by extension) the int/unsigned type is mapped to the most efficient word of the target processor. The language provides that, as long as you use int/unsigned (without mixing them) in an expression, you get the most performing code.

Also, the language mandates that an int be at least 16 bits wide and at most the same size of a long.

What if you need to do math on 8bits data on a 32bits architecture? Well, you need assembly code to insert mask operation for cutting away the excess data in order to reach the right result.

So, the C language opts for the best performance turning everything into int, avoiding the extra assembly for masking and let the programmer arrange the expressions with cast here and there if anything different is desired.

Unexpected type promotion, sign change, and performance penalty should be three good reasons to avoid using anything different from int and unsigned in expressions (or long long when needed) and keep intXX_t and uintXX_t for storage.

Note that this applies also to function arguments. It happens quite frequently to read APIs where types for integer arguments are defined as the smallest integer capable of holding the largest value for a given parameter. That may seem a good idea at first since the API embeds in the interface the suggestion to the user for a proper range.

In fact, this has to be balanced against the aforementioned problems and don’t really enforce the constraints for two reasons – first, you can actually pass any integral type and get not even a warning and second, possibly your accepted range is a subset of all the possible value of the type, therefore the user is still required to read the documentation.

Finally, when in doubt, use the compiler 🙂 Finding types of expressions via the compiler could be not the most intuitive task. Below you find the code I used for my tests. Happy type-checking!

/* Linux only, sorry */
#include <iostream>
#include <cstdint>
#include <cxxabi.h>

int main()
{
    uint8_t a = 42;
    std::cout << abi::__cxa_demangle(typeid( a ).name(), nullptr, nullptr, nullptr) << "
";
    std::cout << abi::__cxa_demangle(typeid( ~a ).name(), nullptr, nullptr, nullptr) << "
";
    std::cout << abi::__cxa_demangle(typeid( 42u ).name(), nullptr, nullptr, nullptr) << "
";
    std::cout << abi::__cxa_demangle(typeid( ~42u ).name(), nullptr, nullptr, nullptr) << "
";
    return 0;
}

(*) Also keep in mind, if you strive to write a portable code, that uintXX_t types may not exist for exotic architectures. In fact, the standard mandates that if the target CPU has no 8 bits type then uint8_t be not defined for such target. You should use uint_least8_t (meaning a data type that has at least 8 bits) and uint_fast8_t (a data type that is the most efficient for operations with 8 bits).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.