Tag: software engineering

What are we missing? Part 2

It took quite a while to edit the second part, but I hope it is worth the wait.

Optional Semicolons

Once upon a time, BASIC didn’t need any instruction termination symbol. If you wanted to stick two or more instructions on the same line, you had to separate them with a colon (yes, this was before semicolons). Then it was Pascal and C, and the termination/separation character made its appearance (well, maybe history didn’t unfold exactly like this, but this is, more or less, how my relationship with the instruction termination evolved).

Scala, Python, and other languages do not need semicolons or make their use optional in most contexts. This isn’t a great save, but it indeed makes me wonder why we need semicolons in C++; isn’t the “missing semicolon” one of the most frequent syntax errors? And if the compiler can tell that a semicolon is missing, couldn’t the compiler put it there for me?

Well, I guess the problem is backward compatibility. The semicolon-free parser would give a different meaning to existing code. Consider, for example, expressions that are split over multiple lines. In C++, it is ok to evaluate an expression and throw the result away. So, introducing a new statement separation syntax would be a mess – code that used to work may now present subtle problems hard to spot in debugging and code reviews.

Nonetheless, coding without semicolons is somewhat liberating, and remembering to put that character at the end of lines is a custom that I need a while to get back to when switching from Scala to C++.

Garbage collection

C++ has a strange relationship with garbage collection. This may come as a surprise to many, but in the first C++ book, The C++ Programming Language, Stroustrup wrote that C++ could optionally support garbage collection. Microsoft, in the early years of .NET, introduced a C++ extension (managed C++, then C++/CLI) to handle managed pointers – a different class of pointers for garbage-collected objects.

C++ had even a minimal support for GC, leveraged by some libraries such as the Boehm-Demers-Weiser. So, C++ is not a stranger to garbage collection, but this automatic way of deallocating objects has never caught on. In C++23, the minimal GC support was abruptly removed.

The common way for modern C++ to manage memory is via automatic objects and smart pointers. Automatic objects are allocated on the stack, and they are automatically destroyed when the execution leaves the scope where they were allocated. Smart pointers are defined by the standard library, and they provide reference-counting pointers that will automatically dispose of the pointed object when it is no longer used. By properly using std::unique_ptr and std::shared_ptr, memory management headaches are mostly gone.

Many languages went the other way, having garbage-collected objects as the default way to handle memory, with an optional way to allocate and manually free a bunch of memory.

So, what are the advantages of garbage collection? Well, there are three main advantages:

  1. no reference counting management penalty (paid each time you copy/assign a shared pointer around);
  2. thread safety (starting from C++20, there is a std::atomic partial specialization for std::shared_ptr (std::atomic<std::shared_ptr<T>>) that can be used, but – of course – you would pay an extra time for reference count update)
  3. GC works fine with reference loops – such as circular lists – while reference counting has troubles with these data structures.

Garbage collection lets the object exist with no additional space overhead, and the time overhead is incurred periodically during a memory scan that finds unused references and disposes unreferenced objects.

There are two main problems with GC:

  1. Periodic execution of the collector may degrade application performance. GC indeed made huge advances in this area; still, for real-time applications, it may be an issue to keep under control.
  2. Object disposal happens after the object’s last use, but you don’t control when. C++’s predictable destruction time allows C++ programmers to implement the RAII idiom.

So there are pro and cons, what I like about GC is that you don’t have to care about dynamic memory – in C++ I have to think whether the object is referenced only here (unique_ptr) or may be accessed by several parts of the code (shared_ptr), and then maybe I have naked pointers around I should take care of, and maybe I have to transform a smart pointer into another. As you can see, it is not as straightforward to allocate the object and let the GC do the work.

Lazy Values

This one is a bit unusual for the C++ programmer, but it definitely makes sense. Consider a variable with an expensive initialization:

class Foo
{
  val bar = f()
}

In this code, the call to f() happens each time an instance of Foo is created. Now, suppose that according to the execution context, the bar variable is never used. That’s a pity; the code is unnecessarily performing computationally heavy tasks.

The lazy attribute can be used like this:

class Foo
{
  lazy val bar = f()
}

And means that the function f() will be called at the first reference of the variable bar. Should we want to rewrite this in C++, it would be something like:

class TheTypeIWantJustOneInstance {
  T getBar() const {
    if( bar == std::nullopt ) {
      bar = f();
    }
    return *bar;
  }
  mutable std::optional<T> bar = std::nullopt;
};

Ugly and not very readable, the mutable specifier is really the flashing warning sign that something bad is ongoing.

The lazy tool is also useful for creating infinite data structures or processing a subset of a large amount of data without the need to compute or retrieve all the data of the superset.

Of course, there’s more to make this work properly in a multithreaded environment, with shared resources and order initialization defined by access. The only “undefined behaviour” is with recursive initialization (i.e., to initialize a, you need b. But to initialize b, you need a).

Object

The C++ language has no native notion of Singleton, so they are typically implemented as:

class TheTypeIWantJustOneInstance {
  public:
    static TheTypeIWantJustOneInstance& get() {
      static TheTypeIWantJustOneInstance instance;
      return instance;
    }
    ...
};

This may not be very thread safe since if the method get() is concurrently called by two threads, you could get instance initialized twice (at the same address… not good). But even if the thread-safety problem is addressed or avoided, the reader still has to decode a pattern of code to identify this as a singleton.

Scala offers the singleton construct natively. It is called “object”, and it looks like this –

object InstanceOfTheTypeIWantJustOneInstance {
  ...
}

The object construct offers a different perspective on class data. In C++, you can define a member variable or a member function to be static so that it is shared among all the instances of a class. In Scala, there is no such concept, but you can use the companion object idiom.

A companion object is an object that has the same name as an existing class. Methods and variables of the class have no special access to the companion object – they still need to import the symbols to access them. But from the user’s point of view, you can use the Class.member notation to access a member of the companion object. This gives quite a precise feeling of accessing something that is related to the class and not to the instance.

This example is from my solutions to the Advent of Code:

object Range {
  final val Universe = Range( 1, 4000 )
}


case class Range( start: Int, count: Int ) {
  def end = start+count
  def lastValue = end-1

  //...
  def complement : List[Range] =
    import Range.Universe
    assert( start >= Universe.start )
    assert( end < Universe.end )
    val firstStart = Universe.start
    val firstCount = start-Universe.start
    val secondStart = end
    val secondCount = Universe.end-end
    List( Range( firstStart, firstCount), Range(secondStart, secondCount ))
      .filter( _.isNonEmpty )

}

In this example, the class Range defines a numerical range (first value, count). The companion object contains a constant (Universe). The complement operation needs to access the Universe to compute the complement of a range. As you can see, to use the Universe symbol, the Universe class needs to import it.

Another interesting application is to use the companion object to provide additional constructors for the class. Using the apply method (that works like C++ operator()), you can create a factory:

object SimpleGrid
{
  def apply[A: ClassTag]( width: Int, height: Int, emptyValue: A ) : SimpleGrid[A] =
    val theGrid: Array[Array[A]] = Array.ofDim[A](height, width)
    theGrid.indices.foreach(
      y => theGrid(y).indices.foreach(
        x => theGrid(y)(x) = emptyValue
      )
    )
    new SimpleGrid(theGrid)

  def apply[A: ClassTag]( data: List[String], convert: Char => A ) : SimpleGrid[A] =
    val theGrid: Array[Array[A]] = Array.ofDim[A](data.length, data.head.length)
    data.indices.foreach(
      y => data(y).indices.foreach(
        x => theGrid(y)(x) = convert(data(y)(x))
      )
    )
    new SimpleGrid(theGrid)
}

Here, the companion object for the SimpleGrid class provides two alternate constructors. The first accepts grid width and height, and the default content for a cell. The second constructor accepts a list (of lists) and a function to convert the content of the list into cell initialization.

I find this approach interesting because it provides a native singleton concept and, at the same time, simplifies the class construct, removing the burden of class methods and fields.

Conclusions

In this post, we have explored several key concepts and constructs that distinguish C++ and Scala. Some are just syntactic sugar, like lazy vals and objects. You can argue that you can define your CRTC to implement them in a C++ library, but having them in the language sets the standard way for using these constructs, defines the dictionary if you want.

Other concepts are more drastically different – the memory management (alongside the principle that everything structured is accessed by reference) being the most evident. I am not a big fan of GC having delved more than once in optimizing memory usage to avoid that garbage collection spoiling the game (literally game). But aside from the point of relieving the programmer from low-level memory management care, garbage collection allows for better handling of objects.

In the next installment, we’ll go into the more advanced functional direction.

What are we missing? Part 1

I remember when I was (really) young, the excitement of the discovery when learning a programming language. It was BASIC first. That amazing feeling of being able to instruct a machine to execute your instructions and produce a visible result! Then came the mysterious Z80 assembly, with the incredible power of speed at the cost of long, tedious hours of hand-writing machine codes and the void sensation when the program just crashed with nothing but your brain to debug it.

A few years later, I was introduced to C. A shiny new world, where the promise of speed paired up with the ease of use (well, compared to hand-written assembly, it was easy indeed). And later on, C++. Up to this point, it seemed like a positive progression; each step had definitive advantages over the previous one, no regret or hesitation in jumping onto the new cart.

Continue reading “What are we missing? Part 1”

Nothing lasts forever, but Cobol, Fortran, and C++

If you have kept up to date with the latest developments of C++, for sure you have noticed how convoluted and byzantine constructs and semantics have become.

The original idea of a C with Classes at the root of language is long lost and the fabled smaller and cleaner language hidden inside C++ is ever more difficult to spot and use.

The combo – backward-compatibility latch and committee-driven approval/refusal of proposals, make the language evolution spin around. Missing or late additions to the language are sitting ducks, and the lack of networking in the standard library, for a language that is 40, is enough to tell how poorly the evolution of the language is handled.

Continue reading “Nothing lasts forever, but Cobol, Fortran, and C++”

As Smart as a Smart Type – Theory

Recently I listened to a “Happy Path Programming” podcast episode about Smart Types. And that inspired me for this double post. The first part (this one) is about what a smart type is and why you should employ smart types in your code. The second part (yet to come, hopefully soon) is about the troublesome way I implemented an arithmetic smart type template in C++.

Continue reading “As Smart as a Smart Type – Theory”

Is C++ Ready for Functional Programming? – Wrap Up

Wrap Up

Recently I had the questionable pleasure of watching “Cosmic Sin” courtesy of Netflix. The movie is a sci-fi show starring Bruce Willis. I was lured into wasting my time on it, by the trailer promising a space-operish feat and I took the presence of such a star playing in the movie for a warranty on quality. I couldn’t be farther from the truth.

The movie plays out confusingly, lacking a coherent script and motivated actors, pushing the watcher into an undefined state (not UB luckily). The helpless watcher, astonished by how bad a film could be, hopes until the very last for something interesting and entertaining to happen until the mixed relief-disbelief emotion of closing titles puts an end to the suffering.

When thinking about C++ and functional programming I have some of the feelings I had watching the movie. Before being persecuted by my friends from the C++ community I have to make clear that C++ is not that bad, although there are some similarities.

Continue reading “Is C++ Ready for Functional Programming? – Wrap Up”

Is C++ Ready for Functional Programming? – Types

Types

Featured image vector created by brgfx – www.freepik.com

Although functional programming can be done in a dynamically typed language (Lisp, the functional programming language forerunner, had no types), strong static type adds a very natural complement to functional programming.

From a mathematical point of view, a function sets a relationship between the set defined by the function argument (domain in math speaking) and the corresponding results (co-domain). Algebra studies exactly these relationships and may help the programmer in proving certain properties of the functions composing the programs.

Algebraic Data Types (ADT) are aggregated types in the algebra context. Two ADTs are of particular interest – sum types and product types. With sum and product referring to the resulting number of possible values (i.e. cardinality of sets).

Continue reading “Is C++ Ready for Functional Programming? – Types”

Our Father’s Faults – Wrapping it up

Well, I’m running out of anti-patterns and oddly looking code from the legacy of my job-ancestors. I thinks that there are a few that are worth mentioning but don’t build up to a stand-alone post, and then its space for questions and discussions and whether exists or not a way out.

Continue reading “Our Father’s Faults – Wrapping it up”

Out Fathers’ Faults – Actors and Concurrency

When I started this job and faced the joyful world of Scala and Akka I remember I was told that thanks to the Actor model you don’t have to worry about concurrency, since every issue was handled by the acting magic.

Some months later we discovered, to our dismay that this wasn’t true. Or better, it was true most of the time if you behave properly, but there are notable exceptions.

Continue reading “Out Fathers’ Faults – Actors and Concurrency”

Our Fathers’ Faults – Actors – Explicit State

This post is not really specific to Scala/Akka, since I’ve seen Finite-State Machine (AKA FSM – not this FSM) abuse in every code base regardless of the language. I’ll try to stick with the specificities of my code base, but considerations and thoughts are quite general.

FSM is an elegant and concise formal construct that helps in designing and encoding and understanding simple computational agents.

Continue reading “Our Fathers’ Faults – Actors – Explicit State”