Comparisons

Recently I had to spend some time trying to adapt my imperative/OO background to a piece of code I need to write in functional paradigm. The problem is quite simple and can be described briefly. You have a collection of pairs containing an id and a quantity. When a new pair arrives you have to add the quantity to the pair with the same id in the collection, if exists, otherwise you have to add the pair to the collection.

Functional programming is based on three principles (as seen from an OO programmer) – variables are never modified once assigned, no side-effects (at least, avoid them as much as possible), no loops – work on collections with collective functions. Well maybe I missed something like monad composition, but that’s enough for this post.

Thanks to a coworker I wrote my Scala code that follows all the aforementioned principles and is quite elegant as well. It relies on the “partition” function that transforms (in a functional fashion) a collection into two collections containing the elements of the first one partitioned according to a given criteria. The criteria is the equality of the id so that I find the element with the same id if it exists, or just an empty collection if it doesn’t.

Here’s the code:

Yes, I could have written more concisely, but that would have been too much write-only for me to be comfortable with.

Once the pleasant feeling of elegance wore off a bit I wondered what is the cost of this approach. Each time you invoke merge the collection is rebuilt and, unless the compile optimizer be very clever, also each list item is cloned and the old one goes to garbage recycling.

Partitioning scans and rebuild, but since I’m using an immutable collection, also adding an item to an existing list causes a new list to be generated.

Performance matters in some 20% of your code, so it could acceptable to sacrifice performance in order to get a higher abstraction level and thus a higher coding speed. But then I wonder what about premature pessimization? Premature pessimization, at least in context where I read the them, means the widespread adoption of idioms that lead to worse performances (the case was for C++ use of pre or post increment operator). Premature pessimization may cause the application to run generally slower and makes more difficult to spot and optimize the cause.

This triggered the question – how is language idiomatic approach impacts on performances?

To answer the question I started coding the same problem in different languages.

I started from my language of choice – C++. In this language it is likely you approach a similar problem by using std::vector. This is the preferred collection and the recommended one. Source is like this:

Code is slightly longer (consider that in C++ I prefer opening brace on a line alone, while in Scala “they” forced me to have opening braces at the end of the statement line). Having mutable collections doesn’t require to warp your mind around data to find which aggregate function could transform your input data into the desired output – just find what you are looking for and change it. Seems simpler to explain to a child.

Then I turned to Java. I’m not so fond of this language, but it is quite popular and has a comprehensive set of standard libraries that really allow you to tackle every problem confidently. Not sure what a Java programmer would consider idiomatic, so I staid traditional and went for a generic List. The code follows:

I’m not sure why the inner class Data needs to be declared static, but it seems that otherwise the instance has a reference to the outer class instance. Anyway – code is decidedly more complex. There is no function similar to C++ find_if nor to Scala partition. The loop is simple, but it offers some chances to add bugs to your code. Anyway explaining the code is straightforward once the iterator concept is clear.

Eventually I wrote a version in C. This language is hampered by the lack of basic standard library – beside some functions on strings and files you have nothing. This could have been fine in the 70s, but today is a serious problem. Yes there are non-standard libraries providing all the needed functionalities, you have plenty of them, gazillions of them, all incompatible. Once you chose one you are locked in… Well clearly C shows the signs of age. So I write my own single linked list implementation:

Note that once cleaned of braces, merge function is shorter in C than in Java! This is a hint that Java is possibly verbose.

I just wrote here the merge function. The rest of the sources is not relevant for this post, but it basically consists in parsing the command line (string to int conversion), getting some random numbers and getting the time. The simplest frameworks for this operation are those based on the JVM. The most complex is C++ – it allows a wide range of configuration (random and time), but I had to look up on internet how to do it and… I am afraid I wouldn’t know how to exploit the greater range of options. Well, in my career as a programmer (some 30+ years counting since when I started coding) I think I never had the need to use a specific random number generator, or some clock different from a “SystemTimeMillis” or Wall Clock Time. I don’t mean that because of this no one should ask for greater flexibility, but that I find daunting that every programmer should pay this price because there is case for using a non default RNG.

Anyway back to my test. In the following table I reported the results.

C++ Scala Java C C++
vector list
time (ms) 170,75 11562,45 2230,05 718,75 710,9
lines 81 35 69 112 81

Times have been taken performing 100000 insertions with max id 10000. The test has been repeated 20 times and the results have been averaged in the table. The difference in timing between C++ and Scala is dramatic – with the first faster about 70 times the latter. Wildly extrapolating you can say that if you code in C++ you need 1/70 of the hardware you need to run Scala… there’s no surprise (still guessing wildly) that IBM backs this one.

Java is about 5 times faster than Scala. I’m told this is more or less expected and possibly it is something you may be willing to pay for higher level.

In the last column I reported the results for a version of the C++ code employing std::list for a more fair comparison (all the other implementations use a list after all). What I didn’t expected was that C++ is faster (even if slightly) than C despite using the same data structure. It is likely because of some template magic.

The other interesting value I wrote in the table is the number of lines (total, not just the merge function) of each implementation. From my studies (that now are quite aged) I remember that some researches reported that the speed of software development (writing, testing and debugging), stated as lines of code per unit of time, is the same regardless of the language. I’m starting having some doubt because my productivity in Scala is quite low if compared with other languages, but … ipse dixit.

Let’s say that you spend 1 for the Scala program, then you would pay 2.31 for C++, 1.97 for Java and 3.20 for C.

Wildly extrapolating again you could draw a formula to decide whether it is better to code in C++ or in Scala. Be H the cost of the CPU and hardware to comfortably run the C++ program. Be C the cost of writing the program in Scala. So the total cost of the project is:

(C++) H+C×2.31

(Scala) 68×H+C

(C++) > (Scala) ⇒ H+C×2.31 > 68×H+C ⇒ C×1.31 >67×H ⇒ C > 51.14×H

That is, you’d better using Scala when the cost of the hardware you want to use will not exceed the cost of Scala development by a factor of 50. If hardware is going to cost more, then you’d better use C++.

Beside of being a wild guess, this also assumes that there is no hardware constraint and that you can easily scale the hardware of the platform.

(Thanks to Andrea for pointing out my mistake in inequality)

Ugly Code

Unfortunately I’m a sort of purist when it comes to coding. Code that is not properly indented, global and local scopes garbled up, obscure naming, counter-intuitive interfaces… all conjures against my ability to read a source and causes headache, acid stomach and buboes.”Unfortunately” I wrote, meaning that’s most unfortunate for the poor souls that have to work with me to whom I should appear as sort of source code taliban.
Recently my unholy-code-alarm triggered when a colleague – trying unsuccessfully to compile an application produced by a contractor – asked me for advices.
More and more I delved into the code, more and more my programmer survival instinct screamed. The code was supposedly C++ and the problem was related to a class, that I would call – to save privacy and dignity of the unknown author – SepticTank. This class interface was defined inside a .cpp and then again in a .h. Many methods were inlined by implementing them in the class interface (and this possibly was part of the problem).
After resolving some differences, the code refused to link because there was a third implementation of the SepticTank destructor in a linked library. I claimed that such code couldn’t possibly work (even after disabling the dreaded precompiled headers – never seen a Visual Studio project working fine with precompiled headers), even if we could manage to get it compiled and linked the mess was so widespread that nothing good could came.
My colleague tried to save the day removing the implementation of the SepticTank destructor so to leave the implementation found in the linked library.
At the end he had to give up because the code was broken beyond repair and even if it compiled and linked it crashes on launch (not really surprising).

What stroke me most basically because it caused a slight fit of dizziness, was the sight of the mysterious operator below –

My brain had some hard moments trying to decode signals from eyes. Then it figured out that the coder was redefining the implicit conversion to class pointer so to use instances and references where pointers where expected… why on the Earth one should want something like that?!?
Implicit conversions if not handled correctly are a sure way to kick yourself on the nose and this is enough a good reason to stay away. But… trying to enter the (criminal) mind that wrote that code, what’s the purpose? Just to avoid the need for the extra ‘&’ ? Or is it a Javism? Maybe it is better to stay out of some minds…

I’d like to introduce…

Say you have slightly more than one hour to talk about C++ to an audience of programmers that range from the self-taught C-only programmer to the yes-I-once-programmed-in-C++-then-switched-to-a-modern-language. What would you focus your presentation on?I started composing slides thinking C++ for C programmers, but it is a huge task and sure the result won’t fit into a single week.
Also I must resist the temptation to teach, since a single hour is not enough to learn anything.
Then I am planning to re-shaping my presentation in form of wow-slides. I mean every slide (no more than 10-15 of them) should show a C++ idiom / technique / mechanism that would cause a wow-reaction in a C programmer or in a C++80 programmer.
Advices are highly appreciated.

Java vs. C++, last call

Well, I dragged this for too long. Now I have to pick the language for the next project. My language of choice is C++, but there are some serious motivation for using Java, one is that all of the junior programmers involved has some experience of Java, another is that middle management is biased toward Java. So, I can still chose C++, but I have to put forward strong and sound engineering reasons. Also because this can become a little … political. I mean – should the project be late for “natural” development reason, or whatever other reason, I don’t want to be blamed because I decided for a language perceived as complicated and rough.
The application isn’t going to need machoFlops or be resource intensive, therefore performance-wise there is no reason to avoid Java.
Java has a thorough and complete library, quite homogeneous even if not always well designed.
I consider C++ to be more flexible and expressive, but is there a more demonstrable software engineering property of C++ that I can use in my argumentations to favor the choice of this language?
(Or conversely is there any demonstrable advantage of Java the could convince me to switch my language of choice).

C++ lambda functions, my first time

Some days ago I wrote my first lambda expression in C++.

If you are back at the C++85 age, this could look like any other alien language from a Star Trek episode. If you are updated at the more recent C++98, the syntax would look funny at minimum.
At least, that is what it looked to me before starting to get some information about the new standard (you can find a brief yet comprehensive recap of what changed in C++ on this specific page of Bjarne’s website).
You should read the code above as follows. The square brackets defines the intro for the lambda function. Within the square brackets you should put what the lambda function should capture of the environment.
In the code, I stated that n should be captured by reference (&). I could have replaced the ampersand if I wanted to have the capture happen by value. Or I could have put nothing should I wanted the lambda function to capture everything.
Next the argument comes, not different from standard function. Eventually the function body.
Once you get this quick start, you will easily decode as a way to fill a range with increasing integers. Take a breath, get another look at the code, ok, now it should make sense.
Well, I’m quite with Stroustrup when he says that he has mixed feelings about lambda functions (I am looking for the quote, but I don’t have it handy). For simple function lambdas are a godsend. On the other hand, lambdas can yield a terrific potential of hiding mechanism and causing major headaches should they escape your control.
If you compare the line above with the code you have to write previously, it is obvious that lambda expressions are a giant leap forward.
In the C++98 days you ought to write –

And that is very uncomfortable and inconvenient. By looking at code like this it easy to question whether it makes sense to use the std::for_each at all rather than roll your own code.
But, look at the traditional code

This code is pretty clear to anyone has a minimum familiarity with C or one of the derived language (yes, there is the dereference operator which involves pointers, but shouldn’t pose real obstacles to comprehension).
Is it error prone? Possibly as any code longer than 0 bytes. std::for_each saves you at least two errors – messing up the loop termination condition (e.g. forgetting the ! from the comparison, or comparing for equality rather than inequality) and missing the increment of the iterator (this happened to me at least once).
These reasons may not be enough to recommend std::for_each in C++98, but it is hard to argue against if you can use the lambda functions.

Simple I/O Messing Up

Basic I/O should be simple. I guess you agree. And I guess that’s way many C++ or Java programmers that look back to humble printf with some nostalgia. In fact it is hard to beat the conciseness and the clarity of something like:

When it comes to create formatted output C printf is usually one of the best tool available.
Unfortunately things are not so simple. One of the first limitations acknowledged for this little gem is that it lacks of robustness, or, put from a different perspective, it doesn’t type check.
What happens if ‘x’ in the above example is a float number? or a pointer? Or worst if the format string specifies a string pointer and an integer is passed?
This problem is mostly overcome in the GNU compiler via a custom extension that allows the compiler to check for consistency between format string and arguments.
The mechanism is enough flexible to be applied to user defined functions. Suppose you have a logging function that behaves like printf, something like

That’s handy so you don’t have to perform string processing to build your message to log when you want just log. If you use Gcc and declare the function like:

Then the compiler will kindly check all the invocation of function log in your code to ensure that specifiers and arguments match.
So far so good, but enter C99. In the old days there was an integer type (int) with two available modifier (short and long). That was reflected in printf specifier/modifier: %d is for straightforward ints, %hd for shorts and %ld for longs.
And this is fine until you work on the same compiler and platform. If your code needs to be portable, then some complications are ready for you.
The last standard (C99) mandates a header, namely stdint.h, where a number of typedefs provide a wealth of integer types: grouped by size and by efficiency, you have (if I count correctly) some 30 types.
From one side this is jolly good since poses an end to the critics against C for not having an integer type with a declared bit size valid for all platforms (like Java has).
Unfortunately, on the other size printf is not able to autodetect types and thus you have to write a different format string whether your int32_t is defined as long int, or just int.
To leave the nightmare behind C99 mandates another header file – inttypes.h that provides the proper specifier for each one of those 30 integer types. For example, if you want to print an int32_t, you have to write:

As you can see it relies on the C preprocessor that merges two consecutive strings into one.
That does the job, but, IMO some simplicity of the original idea is lost.

c++0x and auto

According to the acronym C++0x, we shouldn’t be waiting for the next language standard more than one year and half. According to the uneasiness I read in the messages from the committee board, maybe the waiting could last slightly more. Anyway the next C++ is about to debut. By reading what is going to pop up in the next standard, I got the strong impression that this is another step away from the original language. Let’s face it, C++ 98 (ISO/IEC 14882:1998) is a first step away from the original, non-standard language that included templates as an afterthought. It took years for compiler vendors to reach compliance, leaving the developer community in a dangerous interregnum, a nobody’s land where portability and maintainability concerns where stronger than writing proper code. Also the standardization process left a number of lame aspects in the language – the iostream library, inconsistencies in the string class and the other containers, a mind boggling i18n support, no complete replacement for C standard library, just to name the firsts that come to mind.
The next standard seems to take a number of actions to plug the holes left in the previous one and a number of actions to define a new and different language. For example there will be a new way to declare functions, that will make appear pale an unoffensive the transition from K&R style to ANSI C. What today is declared as:

is going to be declared also as:

I understand there’s a reason for this, it’s not just out of madness, nonetheless, this is going to puzzle the Joe Average developer. Once the astonishment for the new notation is expired, how is he supposed to declare functions?
The impression I got about C++0x going to be a different language has also been reinforced by a sort of backpedaling on the “implicit” stuff.
C++ has a lot of stuff going on under the hood. Good or bad, you chose, nonetheless you get by default a number of implict stuff, e.g. a set of default methods (default constructor, destructor, copy constructor and assignment operator), and a number of implicit behaviour, such as using constructors with a single argument as conversion constructor.
Now this has been considered no longer apt for the Language, so modifiers to get rid of all this implicit-ness has been introduced. E.g. conversion operators may be declared “explicit” meaning that they will not be used implicitly when an object is evaluated in a suitable context. In a class each single default method can be either disabled:

Or explicitly defined as the default behaviour:

 

 

Again I see the rationale behind this, but I find that changing the rules of the language after 30 years of its inception is going to surprise many a developer.
One of the most welcomed addition in the new standard, at least in my humble opinion, is the new semantic for the auto keyword. If you use the STL part of the standard on a daily base, I’m quite sure you are going to agree. Let’s take for example something like:

After some manipulation say you want to sweep through the vector with the iterator idiom. You can wear a bit your keyboard by writing the short poem:

 

 

I usually go for a couple of typedef so that the iterator type can be written more succinctly. The new standard allows the programmer to take a shortcut. If the type of the iterator is defined by the return type of bar.begin() then it could be catch internally and used to declare i. That turns out as:

As you see, this is extremely more readable (and writable altogether).
Well, well, well, too bad we have to wait at least one year for the standard and an uncountable number of years before vendors update their compilers. But, if you use GNU C++ then you may not be helpless.
GNU C++ implements the typeof extension. Just like sizeof evaluates to the size of the type resulting from the expression to which it is applied, typeof evaluates to the type resulting from the expression to which it is applied. E.g.:

(this works much like the decltype keywords of the next C++ standard). Given that the expression to which typeof is applied is not evaluated, then no side effects could happens. And this calls for the handy preprocessor:

Now this macro does much the yet-to-come auto keyword does:

Note that typeof doesn’t do well with the references, so, in some cases (such as the example below) could behave unexpectedly:

If your compiler of choice doesn’t support typeof (or decltype), then you have to wait for it to become C++0x compliant.

Friends and namespaces

There are C++ behaviors that may leave you a bit astonished, staring at the lines on the monitor and wondering why the code isn’t compiling, or doesn’t work as expected. Just stumbled in one of these cases.
I usually follow these steps to recover from the puzzled face. First I write a minimal example that reproduces the behavior. It should be a bunch of lines in a single file. Sometimes this could be a daunting task, but I have that it is always worth to grasp the problem.
In fact, once you have the minimal code, you can easily experiment, changing and twiddling bits to see how the behavior changes.
Then you have two options – you can ask your local C++ guru about the problem (if you have one), or you can google the Internet for a clever selection of keywords that describes your problem.
So what happened today?
I decide to move some code I developed into a namespace-constrained library. Everything compiled happily outside the namespace, but failed to do so in the namespace. After some headscraping, I started cutting and shaping a minimum file with the same odd behavior. Here you are:

Now, if you compile it defining the symbol USE_NAMESPACE (e.g. via g++ -Wall -DUSE_NAMESPACE -c prova.cc), then you get the odd looking error:

While if you compile without the namespace everything works as expected. Since the error was quite meaningless to me, I started investigating on friend and namespace. After some mailing list browsing, I figure it out. And it was simpler than what appeared – just a case for a misleading error.
In fact the friend statement declares a function fn somewhere in the NS namespace, while actually fn is defined in the global namespace. In fact there is just a using statement. To fix the problem, just move the fn function into the NS namespace.
Well and I have figure it out alone, without the need of calling my uber-C++-guru friend Alberto.
On a completely unrelated topic, today is the 25th anniversary of the marvelous ZX Spectrum. Happy Birthday Dear Speccy.

The Design and Evolution of C++

C++ language despite of the powerful mechanisms supported is not a language for the faint hearted. Two forces drive its peculiar concept of friendliness (it is not unfriendly, just very selective) the backward compatibility with C and the effort to not getting in the way to performances. This book, written by the language father, presents and analyzes the language history and the design decisions. And, given the writer, the perspective you get reading the book is very interesting and more than once helps to shred some lights in the dark corners of the language.
The history is very interesting since it details how the language genesis and marketing went from the AT&T labs to the academy and industry.
C++ design principles are presented and the most notable is that of ease of teach-ability. Several time proposed/existing features had been modified or dropped entirely because they were not easy to teach.
Another very interesting principle is the “you don’t pay what you don’t use”, meaning that features added to the C language in order to define the C++ language were designed so that the programmer would not incur in any penalties if not using them. That’s why if a class has no virtual method, then the pointer to the virtual methods table is not included, saving the pointer space from the class instance memory footprint.
Aside from answering to many questions, the book opens up a bunch of new ones. For example, the very first implementation of C++ has been developed practically around a threading library. Now more than 30 years later, in a world with an increasing presence of multi-core machines, the C++ standard still lacks of a multithreading / multiprocessing facility.
Also Stroustrup asserts more than once that a Garbage Collection way of managing memory could be add by a specific implementation. But fails to explain how this non-deterministic way of terminating dynamic memory life could deal with the deterministic needs of destructors. Likely I’m just to dumb to figure out myself.
The big miss I found in the book had been a comparison with Java language. Basically one of the great contenders for the title of most widely used programming language. Java, on its side, has some interesting approach to language design that conflicts with those of C++ (e.g. the C compatibility issue). Therefore it would have been nice listen from Bjarne voice his thoughts about. In his defense it has to be noted that by the date of this book hit the streets, Java hype had just been started.
Last complain about the book is the lack of conclusions. The book seems cut a couple of chapters before the real end. Aside from stylistic point of view, some words about the future evolution and perspective would have been at their place at the end of the book.