This post assumes a basic understanding of UML statechart behavior related to entry/exit-state handling as well as the associated order of processing such events. If needed, please review the references at the end of this post before reading further.
It has been nearly 20 years since I first studied UML statecharts. Since that initial exposure, I have applied event driven active object statechart designs to numerous projects. Nothing has abated my preference for this pattern in my firmware and embedded software projects. Through the years I have taken note of a handful of common challenges and flaws when creating statechart based designs. This post, focused on a possible exit-state handler design issue, is the first entry in a series of design tips regarding UML statecharts.
Be wary of exit-state driven behavior
To understand the concern underlying this guidance, we will first examine two trivial examples demonstrating appropriate exit-state handler behavior.
In Example 1, the State-of-TransmittingReport will upon entry create a report object and start a transmit process. Upon exit the same state destroys the report object. This is an example of a state exit-handler performing a clean-up operation, much like a C++ object would be required to do in a destructor.
Another example of appropriate exit-state handler behavior may be seen in Example 2.
In Example 2, the State-of-FlashingLED starts a timer upon entry and upon exit stops the same timer. Much like Example 1, this exit state handler is performing an appropriate change to a shared resource previously modified by the state’s entry handler. In effect the State-of-FlashingLED has taken ownership of the shared timer resource. Shared resource usage and associated state-machine driven behavior should be detailed in the software’s design and/or architectural requirements such that all engineers contributing to the software follow similar patterns of behavior, especially with respect to shared resources. In this example, the architecture document may specifically state: “All states that start any timer resource must always stop the same timer resource upon exit.”
Now that two reasonable and appropriate exit state behaviors have been reviewed we move to our first example that violates the guidance of this post.
Note-1: The following examples assume all methods shown induce asynchronous system behavior.
Note-2: Although these examples are contrived, I have personally fixed nearly identical flaws in multiple legacy state-machine designs.
On first glance Example 3a appears to be reasonable: i.e. the state is setting some shared resource (LEDPattern) back to a known state after having been previously modified by the state upon entry. However, this design choice is prone to bugs and glitches in device behavior, especially as the system’s state-machine design is expanded. Why? Let us examine a more complete example in the following figure.
Upon examination of Example 3b, I can almost see the incoming bug report from QA: “Brief random LED glitches when Button-B follows Button-A.” To understand my prediction follow the order of events and remember to assume asynchronous behavior:
- The device under test is in the State-of-NormalOperations
- Button A is pressed
- State-of-NormalOperations-exit: no operations defined
- State-of-BehaviorFoo-entry: Sets LED pattern to ‘A’ then executes StartFoo();
- Button B is pressed (before “Foo” is completed)
- State-of-BehaviorFoo-exit: StopFoo() then Sets LED pattern to ‘normal’
- State-of-BehaviorBar-entry: Sets LED pattern to ‘B’, then executes StartBar();
As can be seen in the above detailed sequence of events, the device would undergo a brief transition in the device’s active LED pattern to ‘normal’ before settling on the desired pattern ‘B’. Depending upon the nature of the SetLEDPattern(…) method and the associated hardware design, the end-user may or may not notice a glitch in the LED output. At a minimum the state-machine is executing unnecessary code. In a worse case scenario, a sub-system could be left in an unexpected and incorrect state.
How would we fix this situation? We would require all states to only modify shared resources in their entry-state handlers, never in their exit-state handlers unless explicitly required by the firmware design guidance (see Example 2 above). Example 3c below details our solution.
And with that simple change to avoid driving behavior in exit-state handlers, we now have a correct and more maintainable state-machine design.
If any questions or concerns, please feel free to leave a comment.
More UML statechart tips:
-  The UML spec: https://www.omg.org/spec/UML/2.5.1
-  https://www.w3.org/TR/scxml/
-  Related presentation: https://covemountainsoftware.com/wp-content/uploads/2016/08/event-driven-software-and-statecharts.pdf
-  GPL or commercial solution: https://www.state-machine.com/
-  Qt’s statemachine: https://doc.qt.io/qt-5/statemachine-api.html
-  “Practical UML Statecharts in C/C++”, 2nd Edition, by Miro Samek, https://amzn.to/2uaSFH7
Photo by Sam Dan Truong on Unsplash
This fix presupposes you have a constraint that all states will actually adjust the shared resource to the appropriate value for the new state on entry. In large systems projects, that feels like a hard constraint to impose especially as multiple states may be implemented by different people. It feels safer for someone developing the behavior of a particular state to always assume they are required to “clean up” their use of any shared resources.
Am I right in thinking that the implementation solution to this issue is to have some inherited behavior for all states that sets the resource to a “normal” state and have any state that needs to do something different than “normal” on entry override this behavior?
I think even in a large system  I would personally prefer avoiding the unnecessary CPU overhead involved in modifying a common resource in an exit-state handler, especially when it is likely (required in this example) that the next state’s entry-state handler will also modify the same resource. This would negatively impact CPU usage and code size, attributes that are of particular concern to the firmware and embedded software domains.
That being said, I can certainly imagine situations where the system must guarantee all clean-up behavior for safety or absolute correctness reasons. This would then be dependent on the particulars of the system. Furthermore, I could imagine a system where the example bug I describe could be considered, in and of itself, a safety flaw (i.e. if LED indicators briefly flashed “normal” as the system was transitioning to an error state, this would likely be considered a serious flaw in a safety environment).
To your second question: Yes, if we wanted to guarantee this clean-up behavior (in an OO environment ) a particular approach would be to create an appropriate base class with the desired guaranteed behavior, and then direct the software engineering team to derive from that class (when appropriate).
Thank you for the question, it certainly demonstrates why I said to “be wary of”, rather than declaring it an absolute rule.
 I would perhaps define ‘large’ as a system where a single state-machine (active object, HSM) is being modified and maintained by more than 5 people, which probably implies at least 50 to 100 states. Systems this large often use formal UML modeling tools and associated code generation, which would perhaps then further alleviate the concern (while creating new concerns…).
 In an OO environment where each state is also a class. An example is Qt’s QState: https://doc.qt.io/qt-5/qstate.html. In other more “C like” state-machine environments, common clean up utility functions may be required, which would then not give us the guarantee as developers would need to remember to add that common code to their exit-state handlers.