Our Father’s Faults – Wrapping it up

Oh my god, it's full of stars

Well, I’m running out of anti-patterns and oddly looking code from the legacy of my job-ancestors. I thinks that there are a few that are worth mentioning but don’t build up to a stand-alone post, and then its space for questions and discussions and whether exists or not a way out.

The procrastination anti-pattern

I found this anti-pattern is some occasions when a message was received by an actor during a warm-up stage. This is a not so infrequent condition since in an actor system, actors can restart at any time while the rest of the system continues to evolve.

Our fathers’ idea was to let the actor wait a while with a timer and use the timer facility to re-send the message to the actor itself.

I.e. the actor receives a valid message, but the actor is not able to operate on the message, so the message is pushed back for a while, hoping that when the time comes, the actor will know what to make of the message.

This pattern is so wrong that I don’t even know where to start to list the problems.

  • If the message is some form of request the sender may time out and get a valid reply when it is too late;
  • The time when the actor will know how to process the message may never come, and the message will be kept forever in the actor input queue.
  • The time when the timer will expire could be even a worse time than now to process the message.
  • The ordering of the message in the receive queue is messed up

I’m calling this the procrastination anti-pattern because it suggests that your future self will be more capable than your actual self to manage something. (I was tempted to name this the short fuse anti-pattern because it is also close to the situation when you receive something that is about to explode and you try to pass it to someone else, that in turn will do the same.)

A better way to deal with this is to use the stash() function to stove the message away and to unstashAll() when your actor is operational. This is not the ultimate solution, since things can still go wrong, but, at least, you clarify your intent to the maintainer and avoid to be dependent from an arbitrary delay.

stash() places the currently received message in an ordered storage, then unstashAll() moves the content of such storage back in the receive queue preserving the original order of reception (and the correct order is maintained even if messages are received before the switch to the new state completes.)

If the nature of the stashed message is fire-and-forget then this solution is fine. It doesn’t work when received messages expect a reply. Usually the querying actor will have a timeout to prevent endless waiting. Stashing or procrastinate cannot solve this problem, you need to resort to design. Maybe one of the most general and apt solution is to provide the caller with a I’m-working/Busy/Waiting kind of message so that it is up to the caller to decide whether to retry later or not.

Oh my god, it’s full of stars

Actually Stars of the Hollywood kind :-). When I received the source code at first there were hundreds of actor types that turned into thousands of instances when running. The actor is a lightweight abstraction, so workload is not really a problem (even if we were using an industrial entry-level pc), but interaction was. Actors are untyped and untyped languages are suitable for small projects.

Lua’s author stated that his language was intended for programs that could be written in hundreds lines. That makes sense because without the support of types is difficult to ensure that coupling between components is done correctly as intended by the component developer. Untyped languages offer some way to detect mismatch, but it occurs at the worst possible time: run-time.

Actor systems suffer from the same problem – once you have hundreds of different types of actors talking together in a type-blind manner is so easy to get things wrong.
The best approach is to enclose actors in subsystems with a typed interface (as already learned). Futures can be used where asynchronous behavior is needed.

Yes, I’m talking to you

The problem with a huge number of actors and the lack of a well-designed architecture is that everyone talks with whoever is needed at any time. References to actors were passed both in the form of ActorRef and recovered via actor path.

Actor Path is to reference what Untyped is to Type System. Attempting to access an actor via its path may fail for several reasons – i.e. the effect is that an actor is not where you expect it, but the cause is unknown.

It may be a transient error (Actor is restarting either because it is part of a restarting hierarchy or because it is restarting on its own), or a permanent error (path names changed). Even if the actor is found, it may not be the actor of the type you were looking for.

Passing around ActorRef is somewhat a better approach (even if type is still an unsolved problem), but the net result is a hairy web of dependencies.

There is hardly a substitute for a good design. Dependency injection could be useful, but it has to be under control to avoid the “global variable” effect. Also when modifying code you must seriously considering the technical debt you add by taking what you need anywhere you find it, regardless of sound dependency or abstraction level.

Radioactive RAD

Under pressure, it may be natural to look for ready-made components to solve your problems. It saves time and provides a scapegoat in case something goes wrong (it’s not my fault!) Unfortunately not every problem has a good impedance match with a ready-made component or library.

The application needed to persist messages in an outbound queue, so that network outage didn’t cause any trouble even in case of power-cycle or service restart. The use case was pretty simple – a strictly ordered queue fed by messages and emptied by network message delivery. Any uni student with some background of concurrent programming could code it in a few days.

Then you need quite a time (month(s?) to get rid of every bug, especially those pesky ones that happen under stress or mysterious circumstances.

Resorting to a 3rd party component may be a reasonable choice to save a few weeks while being sure that queue errors have been already pruned.

The choice of adding ActiveMq (as a library) to the application had good intentions, but in hindsight was not a happy one. First ActiveMq is a multiprotocol, multiuser, client-server, queuing system. It has so many options that it hurts. When ActiveMq starts, the ground trembles and the air is filled with sounds of heaven trumpets.

By analogy is like to address the need of crossing a small stream by using an aircraft carrier. It does the job, but it is a bit oversized.

Given that the application runs on a low-end industrial PC, having such a leviathan is not very healthy. Inside the program, a countless number of threads and hefty chunks of memory were subtracted to the rest of the application. Outside, the disk usage was quite mysterious and “erasing the queues before restarting the service” had become a sort of a mantra.

Lessons Learned

  • Design your system and resist the temptation of free, unrestricted access from everywhere to everywhere. If you are coming from OOP consider this like having everything public in your classes.
  • Don’t rush into getting 3rd party components/libraries to solve your problem. Sometimes it is the right thing to do, other times it is just wrong and what you save will be lost shortly.
  • Consider timings and provide specific responses for your actor to reply during warm-up.

Wrap-up

I hope you enjoyed reading this series, I surely enjoyed writing it. In any case, I’d like to receive your feedback and comments. I hope no one of my code-fathers got too upset in reading these posts – I am fully aware that in the same situation is likely that I would have made my good dose of … technical sins.

It’s time for some meta-lesson-learned. I think that the most important lesson is that there is no magic in new technologies – they may provide more powerful and comfortable tools, but you need always do your homework. You need to understand dependency among components and design them so that the system is maintainable. You need to understand the tool and what they do behind the curtains.

RAD and prototyping are your best friends today when the boss is asking you to show something, but they will be your worst enemy a couple of months later when the boss will start selling the product and problems will start coming back from the field.

Scala is a great language and functional is an exciting and interesting programming paradigm. I fully agree that the way for more robust and resilient software is paved with functional code. Also, as of today Scala would be my language of choice for writing serious software applications.

But… but Scala was born in the academy and be closely looking at it you will notice that the main purpose was to simplify access, rather than to enforce industrial project software engineering.

This doesn’t prevent Scala to be used proficiently and successfully for industrial applications but programmers are required some self-inflicted moderation to keep the codebase in good and maintainable shape.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.