Our Fathers’ Faults – Actors – Explicit State

State Machine

This post is not really specific to Scala/Akka, since I’ve seen Finite-State Machine (AKA FSM – not this FSM) abuse in every code base regardless of the language. I’ll try to stick with the specificities of my code base, but considerations and thoughts are quite general.

FSM is an elegant and concise formal construct that helps in designing and encoding and understanding simple computational agents.

Akka Actors and FSM

Let’s start from the Akka Actor, I wrote that it is a reactive computation agent, meaning that it receives messages and computes something as a reaction. If the same message triggers different actions according to the past sequence of received messages then the actor has an internal state (or uses an RNG 🙂 ). In this case, it is a de-facto state machine. Akka provides a way to switch the receive function (via the context.become call). Switching the receive function is, by all means, a state change since the accepted set of stimuli and the specific reactions could be different.

Akka FSM

Akka provides an FSM layer over actors. I find it somewhat cumbersome – which could be personal – and quite limited – which is rather objective. I think it is cumbersome because the handling of the state data needs distinct management as if the switching and the state (state data is state) are two distinct and orthogonal concepts. It is limited because it misses some important concepts like state entry and exit actions.

In several cases, I found that FSM coded by my ancestors had problems with timekeeping. Apparently the code looked correct (well, it’s my fathers’ code, you may never say for sure) so maybe there’s a bug somewhere in the library or there are some design problems in the API that hide misuse (anyway, never my fault :-D).

The trouble with State Data

A problem common to many state machines coded by freshmen is that too much of the state is encoded in data and too little in the explicit state.

Puzzled by the above sentence? I would be, so I’m going to try explaining.

From a formal point of view, a finite state machine has a set of states. Events trigger transitions that that update the machine to a (possibly) different state. This model is very elegant but is barely applicable in most common cases. Suppose you are receiving a user name, one character at a time. So if the user name is “max”, the first state is ‘m’, the second state is ‘ma’ and the third is ‘max’. You can imagine that modeling the state machine in this way you would need an astronomical number of states: one for each first character (26), one for each pair of beginning characters (26×26) and one for each triplet of characters (26x26x26). This stupid example of accepting 3 letter user names would require almost one million states.

This is far from practical (and extremely error-prone if coded by hand), so programmers resort to multiplexing states by adding variables (in the example a single state with a string that accumulates received characters). These variables are named state data since their existence and meaning are confined inside the state.

Back to the section opening, where is the problem? The problem is that using state data correctly is hard, especially when some data overflows the state boundaries and become shared among different states. Countless times I’ve seen bugs, even in trivial FSM, related to misuse of state data. Just add the needed explicit states and you are pointing in the right direction for a flawless software.

An extra real state is often clearer and safer – the code has less branches, gets more readable, is more reliable, and easier to test. To prove these claims, just consider a boolean condition added to a state – you will need to check here and there and potentially you may forget a needed check, or oversee the condition since the variable is declared somewhere else.

An apparent anti-pattern

In the early day most Scala code had an exotic and strange charm to me. And some constructs of the language left me with the look of the cow watching a train pass by. Back then I was expecially puzzled when I saw something like:

def stateX : Receive = {
  def x = {
    case A => something
    case B => somethingElse
    // ...
  }
  someAction
  x
}

Of course, I am oversimplifying – the original function was a several hundred-lines monster, with inferred types everywhere and single-letter variable names.

Now it is cumbersome if you try to explain it by recursion, but after you consider it for a while, you notice that this construct does what the onEntry hook is supposed to do in the FSM model. That is, someAction is evaluated just once when someone performs a context.become( stateX ).

This is, in fact, a precious pattern, that once cleared and tidied up may provide very helpful in modeling state machine with actors.

Toward a better approach

In order to properly refactor this mess, I needed a way to define for each state a pair of handlers (onEntry and onExit) beside the state receive function. Taking inspiration from the “apparent anti-pattern” above, I preferred to define a distinct function for the onEntry hook. Also the onExit handler has been buried in the state switching function. Naming conventions also are a convenient way to help the reader to make sense of the code. So I used the pattern stateX_entry, stateX_receive, stateX_switchTo.

Here is how a state is encoded –

def stateX_entry : Receive = {
  log.debug( "[stateX] entry" )
  // stateX initialization code. This code is executed each time a transition
  // activates this state.
  stateX_receive  // the last line always contains the state receive function.
}

def stateX_switchTo( target: => Receive ) : Unit = {
  log.debug( "[stateX] exit" )
  // stateX onExit code. Typically you clear timers and release resources.
  context.become( target )
}

def stateX_receive : Receive = {
  case Message1 =>
    log.debug( "[stateX] Message1 received" )
    // handling for Message1
    // when you need to switch to stateY you do:
    stateX_switchTo( stateY )
  case Message2 =>
    log.debug( "[stateX] Message2 received" )
    // and so on
}

Now this is not perfect, and requires some discipline to apply correctly (quite easy to use stateY_switchTo instead of stateX_switchTo inside stateX), but greatly simplifies and clearly organizes state machine implementation inside actors.

Also note that setting the log level to debug, you get a pretty good picture of the evolution of the state machine.

One last note – the argument type of the stateX_switchTo is => Receive so that the evaluation of the target state is performed inside the function rather than at the calling point.

It would be nice to move this pattern into a trait to use along with Actors, but I still need some practice to reach this goal in a satisfying way. Suggestions are welcome.

Lesson Learned There are a couple of takeaways –

  1. If your actor implementation is a state machine, make it explicitly so.
  2. Do not use Akka FSMs, they lack what you need and are possibly flawed (or makes it easy to write buggy software).
  3. Encode as much state as it is practical in the explicit state (functions of Receive type).
  4. Use a proper convention (such as the one presented above) to encode state entry and state exit hooks.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.