Java vs. C++, last call

Well, I dragged this for too long. Now I have to pick the language for the next project. My language of choice is C++, but there are some serious motivation for using Java, one is that all of the junior programmers involved has some experience of Java, another is that middle management is biased toward Java. So, I can still chose C++, but I have to put forward strong and sound engineering reasons. Also because this can become a little … political. I mean – should the project be late for “natural” development reason, or whatever other reason, I don’t want to be blamed because I decided for a language perceived as complicated and rough.
The application isn’t going to need machoFlops or be resource intensive, therefore performance-wise there is no reason to avoid Java.
Java has a thorough and complete library, quite homogeneous even if not always well designed.
I consider C++ to be more flexible and expressive, but is there a more demonstrable software engineering property of C++ that I can use in my argumentations to favor the choice of this language?
(Or conversely is there any demonstrable advantage of Java the could convince me to switch my language of choice).

C++ lambda functions, my first time

Some days ago I wrote my first lambda expression in C++.

    int n=0;
    std::for_each( begin, end, [[&n]](std::uint8_t& a){a=n++;});

If you are back at the C++85 age, this could look like any other alien language from a Star Trek episode. If you are updated at the more recent C++98, the syntax would look funny at minimum.
At least, that is what it looked to me before starting to get some information about the new standard (you can find a brief yet comprehensive recap of what changed in C++ on this specific page of Bjarne’s website).
You should read the code above as follows. The square brackets defines the intro for the lambda function. Within the square brackets you should put what the lambda function should capture of the environment.
In the code, I stated that n should be captured by reference (&). I could have replaced the ampersand if I wanted to have the capture happen by value. Or I could have put nothing should I wanted the lambda function to capture everything.
Next the argument comes, not different from standard function. Eventually the function body.
Once you get this quick start, you will easily decode as a way to fill a range with increasing integers. Take a breath, get another look at the code, ok, now it should make sense.
Well, I’m quite with Stroustrup when he says that he has mixed feelings about lambda functions (I am looking for the quote, but I don’t have it handy). For simple function lambdas are a godsend. On the other hand, lambdas can yield a terrific potential of hiding mechanism and causing major headaches should they escape your control.
If you compare the line above with the code you have to write previously, it is obvious that lambda expressions are a giant leap forward.
In the C++98 days you ought to write –

class DisposableNameForClass
{
public:
    DisposableNameForClass() : n( 0 ) {}
    void operator() ( int& a ) { a = n++; }
private:
    int n;
};

//..
DisposableNameForClass disposableNameForInstance;
std::foreach( begin, end, disposableNameForInstance );

And that is very uncomfortable and inconvenient. By looking at code like this it easy to question whether it makes sense to use the std::for_each at all rather than roll your own code.
But, look at the traditional code

    int n=0;
    while( begin != end )
    {
        *begin++ = n++;
    }

This code is pretty clear to anyone has a minimum familiarity with C or one of the derived language (yes, there is the dereference operator which involves pointers, but shouldn’t pose real obstacles to comprehension).
Is it error prone? Possibly as any code longer than 0 bytes. std::for_each saves you at least two errors – messing up the loop termination condition (e.g. forgetting the ! from the comparison, or comparing for equality rather than inequality) and missing the increment of the iterator (this happened to me at least once).
These reasons may not be enough to recommend std::for_each in C++98, but it is hard to argue against if you can use the lambda functions.

Simple I/O Messing Up

Basic I/O should be simple. I guess you agree. And I guess that’s way many C++ or Java programmers that look back to humble printf with some nostalgia. In fact it is hard to beat the conciseness and the clarity of something like:

printf("%03d", x );

When it comes to create formatted output C printf is usually one of the best tool available.
Unfortunately things are not so simple. One of the first limitations acknowledged for this little gem is that it lacks of robustness, or, put from a different perspective, it doesn’t type check.
What happens if ‘x’ in the above example is a float number? or a pointer? Or worst if the format string specifies a string pointer and an integer is passed?
This problem is mostly overcome in the GNU compiler via a custom extension that allows the compiler to check for consistency between format string and arguments.
The mechanism is enough flexible to be applied to user defined functions. Suppose you have a logging function that behaves like printf, something like

void log( LogLevel level, char const* message, ... );

That’s handy so you don’t have to perform string processing to build your message to log when you want just log. If you use Gcc and declare the function like:

void log( LogLevel level, char const* message, ... ) __attribute__((format(printf,2,3)));

Then the compiler will kindly check all the invocation of function log in your code to ensure that specifiers and arguments match.
So far so good, but enter C99. In the old days there was an integer type (int) with two available modifier (short and long). That was reflected in printf specifier/modifier: %d is for straightforward ints, %hd for shorts and %ld for longs.
And this is fine until you work on the same compiler and platform. If your code needs to be portable, then some complications are ready for you.
The last standard (C99) mandates a header, namely stdint.h, where a number of typedefs provide a wealth of integer types: grouped by size and by efficiency, you have (if I count correctly) some 30 types.
From one side this is jolly good since poses an end to the critics against C for not having an integer type with a declared bit size valid for all platforms (like Java has).
Unfortunately, on the other size printf is not able to autodetect types and thus you have to write a different format string whether your int32_t is defined as long int, or just int.
To leave the nightmare behind C99 mandates another header file – inttypes.h that provides the proper specifier for each one of those 30 integer types. For example, if you want to print an int32_t, you have to write:

printf( "here's an int32_t: %" PRId32 " and that's alln", x );

As you can see it relies on the C preprocessor that merges two consecutive strings into one.
That does the job, but, IMO some simplicity of the original idea is lost.

c++0x and auto

According to the acronym C++0x, we shouldn’t be waiting for the next language standard more than one year and half. According to the uneasiness I read in the messages from the committee board, maybe the waiting could last slightly more. Anyway the next C++ is about to debut. By reading what is going to pop up in the next standard, I got the strong impression that this is another step away from the original language. Let’s face it, C++ 98 (ISO/IEC 14882:1998) is a first step away from the original, non-standard language that included templates as an afterthought. It took years for compiler vendors to reach compliance, leaving the developer community in a dangerous interregnum, a nobody’s land where portability and maintainability concerns where stronger than writing proper code. Also the standardization process left a number of lame aspects in the language – the iostream library, inconsistencies in the string class and the other containers, a mind boggling i18n support, no complete replacement for C standard library, just to name the firsts that come to mind.
The next standard seems to take a number of actions to plug the holes left in the previous one and a number of actions to define a new and different language. For example there will be a new way to declare functions, that will make appear pale an unoffensive the transition from K&R style to ANSI C. What today is declared as:

int f( char* a, int b )

is going to be declared also as:

[] f( char* a, int b ) -> int

I understand there’s a reason for this, it’s not just out of madness, nonetheless, this is going to puzzle the Joe Average developer. Once the astonishment for the new notation is expired, how is he supposed to declare functions?
The impression I got about C++0x going to be a different language has also been reinforced by a sort of backpedaling on the “implicit” stuff.
C++ has a lot of stuff going on under the hood. Good or bad, you chose, nonetheless you get by default a number of implict stuff, e.g. a set of default methods (default constructor, destructor, copy constructor and assignment operator), and a number of implicit behaviour, such as using constructors with a single argument as conversion constructor.
Now this has been considered no longer apt for the Language, so modifiers to get rid of all this implicit-ness has been introduced. E.g. conversion operators may be declared “explicit” meaning that they will not be used implicitly when an object is evaluated in a suitable context. In a class each single default method can be either disabled:

class Foo
{
    public:
        Foo() = delete;
};

Or explicitly defined as the default behaviour:

class Foo
{
    public:
        Foo() = default;
};

 

 

Again I see the rationale behind this, but I find that changing the rules of the language after 30 years of its inception is going to surprise many a developer.
One of the most welcomed addition in the new standard, at least in my humble opinion, is the new semantic for the auto keyword. If you use the STL part of the standard on a daily base, I’m quite sure you are going to agree. Let’s take for example something like:

std::vector< std::pair<bool,std::string> > bar;

After some manipulation say you want to sweep through the vector with the iterator idiom. You can wear a bit your keyboard by writing the short poem:

for( std::vector< std::pair<bool,std::string>>::iterator i=bar.begin(); i != bar.end(); ++i ) ...

 

 

I usually go for a couple of typedef so that the iterator type can be written more succinctly. The new standard allows the programmer to take a shortcut. If the type of the iterator is defined by the return type of bar.begin() then it could be catch internally and used to declare i. That turns out as:

for( auto i=bar.begin(); i != bar.end(); ++i ) ...

As you see, this is extremely more readable (and writable altogether).
Well, well, well, too bad we have to wait at least one year for the standard and an uncountable number of years before vendors update their compilers. But, if you use GNU C++ then you may not be helpless.
GNU C++ implements the typeof extension. Just like sizeof evaluates to the size of the type resulting from the expression to which it is applied, typeof evaluates to the type resulting from the expression to which it is applied. E.g.:

int i;
typeof( &i ) // evaluates to int*.

(this works much like the decltype keywords of the next C++ standard). Given that the expression to which typeof is applied is not evaluated, then no side effects could happens. And this calls for the handy preprocessor:

#define AUTO(V__,E__) typeof(E__) V__ = (E__)

Now this macro does much the yet-to-come auto keyword does:

for( AUTO(i,bar.begin()); i != bar.end(); ++i ) ...

Note that typeof doesn’t do well with the references, so, in some cases (such as the example below) could behave unexpectedly:

#include
#include

std::string foo( "abc" );

std::string const& f()
{
    return foo;
}

int main()
{
    AUTO( a, f() );
    std::string const& b = foo();
    std::cout << "a=" << a << "n";
    std::cout << "b=" << b << "n";
    foo = "cde";
    std::cout << "a=" << a << "n";
    std::cout << "b=" << b << "n";

    return 0;
}

If your compiler of choice doesn’t support typeof (or decltype), then you have to wait for it to become C++0x compliant.

Friends and namespaces

There are C++ behaviors that may leave you a bit astonished, staring at the lines on the monitor and wondering why the code isn’t compiling, or doesn’t work as expected. Just stumbled in one of these cases.
I usually follow these steps to recover from the puzzled face. First I write a minimal example that reproduces the behavior. It should be a bunch of lines in a single file. Sometimes this could be a daunting task, but I have that it is always worth to grasp the problem.
In fact, once you have the minimal code, you can easily experiment, changing and twiddling bits to see how the behavior changes.
Then you have two options – you can ask your local C++ guru about the problem (if you have one), or you can google the Internet for a clever selection of keywords that describes your problem.
So what happened today?
I decide to move some code I developed into a namespace-constrained library. Everything compiled happily outside the namespace, but failed to do so in the namespace. After some headscraping, I started cutting and shaping a minimum file with the same odd behavior. Here you are:

/** prova.cc
 *
 * @author Massimiliano Pagani
 * @version 1.0
 * @date 24/04/2007
 *
 * @notes
 * @history
 *
*/

#if defined( USE_NAMESPACE )
namespace NS
{
#endif

    class A
    {
        public:
        private:
            struct B { int x; };
            friend bool fn( B const& b );
    };

#if defined( USE_NAMESPACE )
}

using namespace NS;

#endif

namespace NS
{

    bool fn( A::B const& b )
    {
        return b.x != 0;
    }
}

Now, if you compile it defining the symbol USE_NAMESPACE (e.g. via g++ -Wall -DUSE_NAMESPACE -c prova.cc), then you get the odd looking error:

prova.cc: In function 'bool fn(const NS::A::B&)':
prova.cc:21: error: 'struct NS::A::B' is private
prova.cc:31: error: within this context

While if you compile without the namespace everything works as expected. Since the error was quite meaningless to me, I started investigating on friend and namespace. After some mailing list browsing, I figure it out. And it was simpler than what appeared – just a case for a misleading error.
In fact the friend statement declares a function fn somewhere in the NS namespace, while actually fn is defined in the global namespace. In fact there is just a using statement. To fix the problem, just move the fn function into the NS namespace.
Well and I have figure it out alone, without the need of calling my uber-C++-guru friend Alberto.
On a completely unrelated topic, today is the 25th anniversary of the marvelous ZX Spectrum. Happy Birthday Dear Speccy.

The Design and Evolution of C++

C++ language despite of the powerful mechanisms supported is not a language for the faint hearted. Two forces drive its peculiar concept of friendliness (it is not unfriendly, just very selective) the backward compatibility with C and the effort to not getting in the way to performances. This book, written by the language father, presents and analyzes the language history and the design decisions. And, given the writer, the perspective you get reading the book is very interesting and more than once helps to shred some lights in the dark corners of the language.
The history is very interesting since it details how the language genesis and marketing went from the AT&T labs to the academy and industry.
C++ design principles are presented and the most notable is that of ease of teach-ability. Several time proposed/existing features had been modified or dropped entirely because they were not easy to teach.
Another very interesting principle is the “you don’t pay what you don’t use”, meaning that features added to the C language in order to define the C++ language were designed so that the programmer would not incur in any penalties if not using them. That’s why if a class has no virtual method, then the pointer to the virtual methods table is not included, saving the pointer space from the class instance memory footprint.
Aside from answering to many questions, the book opens up a bunch of new ones. For example, the very first implementation of C++ has been developed practically around a threading library. Now more than 30 years later, in a world with an increasing presence of multi-core machines, the C++ standard still lacks of a multithreading / multiprocessing facility.
Also Stroustrup asserts more than once that a Garbage Collection way of managing memory could be add by a specific implementation. But fails to explain how this non-deterministic way of terminating dynamic memory life could deal with the deterministic needs of destructors. Likely I’m just to dumb to figure out myself.
The big miss I found in the book had been a comparison with Java language. Basically one of the great contenders for the title of most widely used programming language. Java, on its side, has some interesting approach to language design that conflicts with those of C++ (e.g. the C compatibility issue). Therefore it would have been nice listen from Bjarne voice his thoughts about. In his defense it has to be noted that by the date of this book hit the streets, Java hype had just been started.
Last complain about the book is the lack of conclusions. The book seems cut a couple of chapters before the real end. Aside from stylistic point of view, some words about the future evolution and perspective would have been at their place at the end of the book.

Adventure in cross compiling land

If you have a C background you may find correct and appropriate that filenames are case sensitive. After all ‘a’ is different from ‘A’ as much as 65 is different from 97. If you have a BASIC background you may equally find correct and appropriate that filenames are case insensitive. It doesn’t matter how you write it, it is always the same character.
Every approach has its strengths and good arguments. The problem arise when one of the parties blindly ignores the rest of the world.
Well this may sounds like gibberish until you try to compile the linux kernel on windows. In this impressive set of source there are a bunch of files in the same directory that have the same name case-insensitive-wise. Moreover, if you untar the sources you get no warning sign since the tar command silently overwrites existing files. You get some hint of the problem when you copy sources with the windows GUI from a samba share to you local disk.
The problem is not so hard indeed, but if you don’t want to tamper source file structure it is better to rely on a case sensitive filesystem. That means that you cannot use Windows.
So I switched to Linux. After all FreeScale gives you the ready made cross compiler for Linux, so it is a clear sign of the way to go.
At workplace I installed the recently released Fedora Core 5. (Yes, I just completed a satisfactory configuration of the FC4 on my laptop 🙁 ). I must admit that Fedora Core, at last, has an original look. And a good look. Apparently they stopped following Redmond and Cupertino and started to set their own trend. Well done, I like it.
On the other hand having FC5 seamlessly working in a Windows environment is not something for the faint hearted. Be prepare to a good amount of swearing before you can resolve names, browse windows shares and access files there contained. Once done it works very well.
Back to Linux kernel compilation. Despite of what you may think by a look at my desk, I like cleanness and order, at least in files. So I want to store in the versioning system nothing that can be derived, just primary sources. With Kernel source this is a little fuzzy. In fact you download sources that are ‘distclean’ed. I.e. no kernel configuration is set. First you configure the kernel either by issuing one of ‘make menuconfig’, ‘make oldconfig’, ‘make xconfig’ or by picking up one of the precanned configuration by typing something like ‘make xyz_config’.
Anyone of these performs some operations – a file ‘.config’ is placed in the kernel source root, some symlinks and some files are created. So far so good. A file named ‘autoconf.h’ is required in order to build the kernel. This file is just the C version of the ‘.config’ file. In other words and simplifying a bit, in ‘.config’ you find something like ‘VALUE=y’ and in ‘autoconf.h’ you find ‘#define VALUE 1’.
Now I would expect that ‘autoconf.h’ is created from ‘.config’ somewhere in the Makefile. This is not true. The only way to create ‘autoconf.h’ (I found so far) is to interact with the system using one of the ‘make *config’. This is bad to me, since it prevents a fully automatic build.
On the other hand it is true that the configuration changes quite seldom, so you don’t have to ‘make distclean’ a lot and the ‘.config’ file you generate will be with you for long.
Maybe I didn’t search enough, but I have to do some actual work and produce some results beside googling for answers trying to bend universe to my personal view.

Cross compiling

Have you ever tried to build a cross compiler under cygwin? Today I tried to do exactly this. Well I quite underestimate the complexity. After all many Unix programs can be happily compiled without much effort under cygwin, why gcc should be different?A cross-compiler is somewhat convoluted, in fact you have first to build a bootstrapping version of the compiler in order to compile the standard C library. Before compiling the compiler you need the binary tools compiled for the intended target. The library needs some information from the operating system. Since my goal was to produce a cross compiler for embedded Linux running on ARM 9, I had to do something with the Linux kernel.
When you successfully compiled the compiler and the library, you have to compile the final compiler.
Luckily today there is internet and google ready to help you. I found many pages and advices, but I haven’t yet achieved my goal. Moreover it is not clear to me whether the cross compiler supports by default all instruction mode that an ARM920 makes available – native, thumb and jazelle. More on this in the next days.

90% of Research

According to a recent tv-ad 90% of research is funded by pharmaceutical corporations. Maybe this data is a little overestimated, nonetheless it is still astonishing. In other words the research for health and cure is privately founded. I wonder to which extent this is good. I mean the logic driving the research for health preservation and recovering is the logic of the Return of Investment, the dividend and share logic. The same logic that led one of the German pharmaceutical giant during the WWII to produce (and sell to the government) the infamous zyklon-B gas used to kill millions of people in concentration camps.
If you google around in internet for these terms you’ll find more than one reason to be concerned. Even if trusting the human nature, it is not possible to not doubt whether such research could find inexpensive treatments for any disease. What if something could be simply let pass, or just a spoonful of water and sugar is fine?
Leaving aside these gloomy thoughts, the link of the day is to Sibling Rivalry: C and C++ a pdf paper from B. Stroustrup detailing the source of incompatibilities between C99 and C++89 and the parallel evolution of these languages that aimed to be one.