So, you know everything about C++ integral types, don’t you?
I thought I did, until when I enabled clang-tidy on my project. It all started with the rather innocent-looking warning:
warning: use of a signed integer operand with a binary bitwise operator
It looked somewhat less innocent when, examining the line, I saw no evidence of signs.
But, let’s start from the comfort zone (at least, my comfort zone):
unsigned x = 32;
The type of
~x (bitwise negation) is still
unsigned. No surprise here, really obvious. The diligent programmer finds that the data fits in a smaller integer and writes:
uint8_t s = 42;
Can you guess the type of
~s? Give it a try, really. Ready? Well, the type of
int. What? A quick check of other expressions involving
uint8_t yields the same … fascinating result. Apparently these expressions are all converted into
In other words (and with a bit of syntax bending)
uint8_t+uint8_t -> int,
uint8_t<<uint8_t -> int,
uint8_t+1 -> int. Let me rephrase that, in every expression an
uint8_t type is converted to
Time for some Internet Duckduckgoing :-).
Back to our
uint8_t (that is nothing but an
unsigned char in disguise). When a
char value, be it
unsigned or plain is encountered in an expression (in C++ standard jargon, a prvalue) it is promoted to
int pretty much on every common CPU. On exotic architectures,
int could have the same size, so
int could not hold every possible value of
char and therefore it is turned into an
unsigned. From a strictly rigorous point of view, the signedness of the type of
uint8_t in expression is machine-dependent. Keep this in mind if you aim to write really portable code 😉 (*)
You can find a very good (and somewhat frightening) explanation here.
But I’d like to add something beyond the standard technicalities and look at the reasons for why this is like this and what we can take home.
First, it is important to fix that in C (and C++ by extension) the
unsigned type is mapped to the most efficient word of the target processor. The language provides that, as long as you use
unsigned (without mixing them) in an expression, you get the most performing code.
Also, the language mandates that an
int be at least 16 bits wide and at most the same size of a
What if you need to do math on 8bits data on a 32bits architecture? Well, you need assembly code to insert mask operation for cutting away the excess data in order to reach the right result.
So, the C language opts for the best performance turning everything into int, avoiding the extra assembly for masking and let the programmer arrange the expressions with cast here and there if anything different is desired.
Unexpected type promotion, sign change, and performance penalty should be three good reasons to avoid using anything different from
unsigned in expressions (or
long long when needed) and keep
uintXX_t for storage.
Note that this applies also to function arguments. It happens quite frequently to read APIs where types for integer arguments are defined as the smallest integer capable of holding the largest value for a given parameter. That may seem a good idea at first since the API embeds in the interface the suggestion to the user for a proper range.
In fact, this has to be balanced against the aforementioned problems and don’t really enforce the constraints for two reasons – first, you can actually pass any integral type and get not even a warning and second, possibly your accepted range is a subset of all the possible value of the type, therefore the user is still required to read the documentation.
Finally, when in doubt, use the compiler 🙂 Finding types of expressions via the compiler could be not the most intuitive task. Below you find the code I used for my tests. Happy type-checking!
/* Linux only, sorry */
uint8_t a = 42;
std::cout << abi::__cxa_demangle(typeid( a ).name(), nullptr, nullptr, nullptr) << "\n";
std::cout << abi::__cxa_demangle(typeid( ~a ).name(), nullptr, nullptr, nullptr) << "\n";
std::cout << abi::__cxa_demangle(typeid( 42u ).name(), nullptr, nullptr, nullptr) << "\n";
std::cout << abi::__cxa_demangle(typeid( ~42u ).name(), nullptr, nullptr, nullptr) << "\n";
(*) Also keep in mind, if you strive to write a portable code, that uintXX_t types may not exist for exotic architectures. In fact, the standard mandates that if the target CPU has no 8 bits type then
uint8_t be not defined for such target. You should use
uint_least8_t (meaning a data type that has at least 8 bits) and
uint_fast8_t (a data type that is the most efficient for operations with 8 bits).