Using State Machines to Orchestrate Multi-Agent Systems

Part 1 of N

February 23, 2025 Long

Since I last wrote a few thoughts down on agents (here and here) the buzz around them has only increased. It seems like everyone is now talking about agentic systems, and in particular multi-agent systems.

Also, recently Anthropic wrote a nice piece, that I really liked, on how they view agentic systems and with some practical advice for when and how to implement them.

Off the back of that, I thought it would be interesting to dig into some of the details of how we have recently gone about implementing a multi-agent system using state machines.

I am noting here, that this was supposed to be a short article but ended up being a 4-part series!

So in this first part I’ll cover the basics of modelling multi-agent systems using state machines and then in the next parts cover how to implement more advanced features within the framework.

The outline of this part is as follows:

From simple single agents to complex multi-agent systems
Enter state machines
Defining the state machine

Part 1: The state machine

From simple single agents to complex multi-agent systems

Single agent systems are easy to imagine from a systems perspective. Like CoT flows, they form linear flows, alternating between executing an action and deciding on the next action to take, based on the output of the previous action.

Single agent flow

Multiple single agent flows are likewise easy to think of, Just like single agent flows they run linearly but just in parallel.

Multiple single agent flows

Fixed multi-agent flows, like ReAct or Reflexion, are more complex but still relatively easy to build and reason about, since the flow of multi-agent interactions are fixed. The agents interact in a pre-determined fashion.

Reflexion

Most tend to think of these multi-agent systems as processes - ie each agent becomes an asynchronous process that communicates with other processes through a communication channel.

The problem then, as always, comes when agents, or processes if you will, execute and communicate asynchronously, since you now need to manage the inter-agent communication and intertwined states.

This is relatively easy to visualize and implement but in practice incredibly difficult to manage when operating at scale and at times almost impossible to debug - if you’ve ever built and deployed a large microservice architecture, you probably know exactly what I mean! 🙈

Enter state machines

Another approach is modeling it using state machines - and this is the approach that we have taken. This approach is significantly easier to reason about than a full multi-process architecture, and I have yet to see any real practical problems that cannot be solved with this approach instead. In many ways it is very similar to, and heavily inspired by, Sir Tony Hoare’s approach to process communication - if you’re interested in that sort of thing. 😊

In the following sections I will dig into how we model multi-agent flows with state machines and the benefits of it. I think the approach we have taken is sufficiently interesting and (perhaps) novel that it is worth sharing in more detail to hopefully inspire/help others currently rolling their own multi-agent infrastructure.

First let’s define the problem space and a few requirements of any potential solution and then explain the specific solution that we have adopted.

In our setting we always have a root agent - a single agent from which all other agents are spawned, this agent is essentially the owner of the state machine and it dictates the clock of all other agents. For each such root agent we have an execution environment, called a session, within which all other agents spawned by the root are executed. This is important, since it means that within a single session all agents share the same clock and environment, which enables the agents to synchronize their communication.

To further set the scene, here are a few initial requirements that we had for the system:

An agent can choose to pass off specific tasks to other agents
We can mix both rule-based and agentic paths
We can roll an agent back to a previous states, and re-run from there - great for debugging!
A finished agent can be restarted based on new user input
The history and flow of an agent can be branched into multiple, parallel flows - allowing for parallel independent reasoning traces
We can dynamically render the historic context of an agent depending on its state
The agent can ask the user questions, if needed
We can dynamically change the availability of tools depending on an agent’s state

Next we will define the state machine and then explain how it addresses each of these requirements as well as give us a number of additional features.

Defining the state machine

A state machine is defined by a set of states, a set of transition rules and a set of initial and final states.

Let’s define what these are in our context.

The states

We split the states into four categories of pairs of states and one terminal state.

The four categories represent the four types of interactions that an agent can have:

LLM interactions - using an LLM to choose the next action
Tool interactions - calling and executing a tool to get a result
Agent interactions - instructing and waiting for another agent to respond
User interactions - getting input from the user

State Machine States

The initial and terminal states

The initial state is always a UserMessage. Any agent task must start with a UserMessage - the very first instruction give to the agent. This is how the user gives the agent its task.

The terminal state of the systems is always the Finished state. There are many ways for an agent to transition to this state, but to finish its flow it must always eventually move to this state.

The transition rules

The transition rules are now pretty simple, in each group of interactions one state moves to the other. For example, the UserMessage always transitions to the AssistantMessage via an external LLM call.

The question then becomes how to move from one group to another. In the case of the LLM Interactions, the AssistantMessage can only transition to a ToolCall. At the next step of the state machine the ToolCall is then interpreted and executed, at which point the state can now transition to a number of new states, depending on the tool implementaton.

Since it is quite simple we can illustrate all the interactions and transition rules in a single diagram:

State Machine Transitions

The interaction stack and transition stepping

To manage the history and current state of each agent we model the history of states as a stack, called the interaction stack, where the top of the stack is the current state of the agent.

If the top state on the stack is of type Finished, then the agent is done and it won’t execute any more steps. Otherwise for each step of the state machine the transition rule of the top state is executed and a new state is pushed on to the top of the stack as the output of the transition, we refer to this as a transition step.

Transition Step

All agents within a session run on a synchronized clock, where for each tick of the clock, all agents are asked to execute their next interaction and move along in their respective state machines by pushing the resulting state onto their interaction stack. The order in which agents executes their transition step within a tick of the clock, does not matter and so is not as such part of the spec. In our case, it is simply by the order in which they were created.

This approach makes it incredibly easy to manage the history and current state of each agent and it allows for a number of advanced features out of the box, like rewinding and re-running from previous states, branches, dynamic tool availability and context rendering, etc. All of which we will dig into in the following sections. It also allows for a very simple and easy to understand implementation, that is easy to debug and extend.

In pseudo-code terms it could look like this

def agent_step(agent):
    if agent.state == Finished: # check if the agent is finished
        return True # Agent is finished
    current_state = agent.interaction_stack.top() # get the current state
    next_state = current_state.transition_step() # execute the transition step
    agent.interaction_stack.push(next_state) # push the next state onto the stack
    if next_state.type == "Finished":
        return True # Agent is finished
    return False # Agent is not finished

def session_step(session):
    finished = True # Assume all agents are finished
    for agent in session.agents: # For each agent in the session
        finished = finished and agent_step(agent) # Check if any agent is not finished
    return finished # Return True if all agents are finished

while not session_finished(session): # While not all agents are finished
    session_step(session) # Execute the next tick
    sleep(1) # Sleep a bit before ticking again

So the event loop that drives the state machine is really simple and easy to understand.

End of Part 1

That is it for now, in this part we covered the setup of the state machine and how it works.

In the next part we will dig into how we can use this framework to better steer agents.

In Part 3 and 4 we will then cover other powerful features of the state machine approach as well as advanced extensions to enable Monte Carlo Tree Search, Roll Outs and Reinforcement Learning for better reasoning.

Part 2: Steering agents »

eriksfunhouse.com

Where the fun never stops!