eriksfunhouse.com

Where the fun never stops!

Using State Machines to Orchestrate Multi-Agent Systems

Part 2 of 4
February 24, 2025 Long

Part 2: Steering Agents

In the first part we described the state machine framework and how it works.

In this part we will dig into some of the techniques for steering agents and how we implement them in the state machine framework:

Techniques for steering agents

One of the biggest challenges with more complex agentic flows is managing them as they evolve.

In the case of simple agentic flows, say a customer support or sales enablement agent, the agent flow is often deterministic and pre-defined. For good reasons. As with a human worker, you donā€™t actually want the agent to have too much autonomy. Instead, you want it to follow a defined process and only use its autonomy when it comes to interpreting human inputs and generating replies - like reading and replying to emails, sending LinkedIn messages, etc.

However, when dealing with more complex tasks you need the agent to have more autonomy to enable it to find its own path to solving the problem, since there isnā€™t an already existing process to solve it. But at the same time, you often still want it to remain within certain bounds and steer it in a particular direction and for it not to be able to try any odd way of solving the problem.

The question then is, how to steer it best. There are number of techiques for doing this by either instructing the agent, changing its environment or the agent itself.

Some of the common techniques are:

  • Prompting (Prompt Engineering) - the roughest and least predictable but also most common way to steer the agent. In this case, you try to steer the agent by fine-tuning the prompt to instruct the agent on a particular way to reason about and solve the problem.
  • Sub-agents - having the main agent delegate tasks to specialised sub-agents that work within certain strct boundaries to solve very particular tasks.
  • Planning - having one part of the agent flow plan out a sequence of steps to solve the problem. This reduces the variance of the flow and typically leads to better reasoning. Especially if the plan is reviewed and revised as the agent progresses. Planning can be done in the main context, in a separate branch, by a sub-agent or even as non-LLM based reasoning step.
  • Branching (or sidebarring) - allowing thinking/reasoning in separate branches of the context, allowing the agent to solve particular sub-tasks in parallel without changing the main context and affecting other parts of the reasoning flow.
  • Changing the historic context - rendering the historic context of the agent differently depending on the state of the agent. This allows you to forget or emphasise certain parts of the agentā€™s history thereby changing it behavior dynamically.
  • Changing system prompt - similar to changing historic context, you dynamically change the system prompt depending on the state of the agent. You are in a way changing the ā€œnatureā€ of the agent dynamically depending on the state.
  • Changing tool availability - again, similar to changing historic context. You dynamically change the tools available to the agent depending on the state of the agent. This is a really useful technique, that allow you to dynamically change the action space of the agent.
  • Changing LLM parameters - changing the parameters of the LLM call dynamically depending on the state of the agent. This can be anything from temperature to changing the model itself.
  • Reflection - either having the agent, or a separate sub-agent, reflect on the output of the agent, given its state and context, and feedback to the agent to allow it to redo and improve its output.

For simpler use cases a static setup, where you only rely on prompt engineering and a fixed set of tools to steer the agent, is sufficient. But as you expand into more complex flows that require longer reasoning traces and more complex tool usage, you often end up using several of the above techniques to steer the agent.

In our case we often use a combination at the same time, so for example we might have a sub-agent that is given a task. Its first step is to decide on the type of tasks it needs to solve and to do that it has one set of tools. Once it has identified the type of problem, the tool set is changed, the agentā€™s system prompt is refined, and the historic context is dynamically rendered differently to allow it to fully focus on the task at hand.

All of the above techniques, and a few others, are available in the state machine framework that we have described. Below we will dig into how the state machine setup enables some of these features.

Steering an agent with our state machine

Now that we have described some of the common techniques for steering agents, letā€™s dig into some of details of how these techniques can be implemented with our state machine approach.

Sub-agents

Sub-agents are started when the state machine is in the AgentCall state, meaning the top state on the interaction stack is of type AgentCall.

When the clock ticks and the state machine executes this state, a new agent is either launched or an existing one is restarted - all depending on the parameters of the AgentCall. The AgentCall state also defines the type of agent to launch, which in turn defines the tools, the system prompt, the context rendering, the LLM parameters etc.

As mentioned earlier, each agent has its own interaction stack and so the sub-agent will start with its own empty interaction stack, if a new agent is launched, or it will start where it previously left off, if an existing agent is restarted.

Sub-agents

In the case of a new agent, once it is started the parent agent will push a UserMessage state onto the sub-agentā€™s interaction stack, containing the instructions for the sub-agent given by the parent. The parent agent will then wait for the sub-agent to finish.

At the next tick of the state machine, the UserMessage will result in an LLM call, leading to an AssistantMessage state and so on. Once the sub-agent is finished it will transition to the Finished state and the parent agent will be notified of the result through an AgentResult state being pushed onto its interaction stack. At this point the parent agent wakes up and can continue its flow.

Continuing conversations

We described how to start a new sub-agent but how do we continue a conversation with a sub-agent? We use the UserResponse state and insert on top of the stack right after the Finished state. The UserResponse state is meant to follow a UserInput state but can be used on its own to transition into a new UserMessage state from where the agent can continue its flow based on new input.

This allows us now not only to restart an existing sub-agent but also for users to continue a conversation with the main agent.

Restarting conversation

Side conversations via Branches

Often, after having run the initial agent flow, you want to continue in multiple directions, in parallel, without affecting the main flow. This is where branches come in. You can see it as branching off the main path of reasoning to explore multiple alternative paths - all from the same starting point.

This is particularly useful when the agent is interacting with users. First you kick off the main agent with a large piece of work - imagine a piece of dat analysis taking 10-20 mins to complete. Now once finished the user can then continue to explore alternative paths of short reasoning via branches instead of having to kick-off a whole new piece of analysis.

Branching

In the state machine setting we handle this by tagging each state with a branch id and at each step of the state machine we iterate over every currently live branch. So revisiting our early pseudo-code of the state machine event loop, it now becomes:

def agent_step(agent):
    if agent.state == Finished: # check if the agent is finished
        return True # Agent is finished
    finished = True # Assume all branches are finished
    for branch in session.branches:
        if branch.finished:
            continue # Skip finished branches
        finished = False # we have at least one branch that is not finished
        current_state = agent.interaction_stack(branch).top() # get the current state
        next_state = current_state.transition_step() # execute the transition step
        agent.interaction_stack(branch).push(next_state) # push the next state onto the stack
        if next_state.type == "Finished":
            branch.finished = True # Mark branch as finished
    return finished # Return if agent is finished

Dynamic prompt rendering

Typically the context of an LLM-based agent is a list of all previous messages between the agent and an LLM. This is then loaded into LLM context when the agent calls the LLM. This means that the history is static - once a message is added to the LLM context it remains there unless it is removed or compressed/summarized to optimise the usage of context window.

In the state machine setting we think of it a bit differently. We view the context as the entire interaction stack up at that point in time and calls to an LLM are just one of many types of interactions that can happen.

In this case the context that is loaded into the LLM is completely re-rendered at every LLM call. This means that the context can change dynamically based on the state of the agent. The context is rendered by iterating over all interactions in the stack and individually rendering each given the current state of the agent and its environment. Allowing each interaction to render dynamically.

In terms of pseudo-code it could look like this:

def render_context(interaction_stack):
    llm_context = [] # The LLM context is a list of rendered interactions
    for interaction in interaction_stack: # Iterate over all interactions in the stack
        # Only render UserMessage and AssistantMessage
        if interaction.type == "UserMessage" or interaction.type == "AssistantMessage":
            llm_context.append(interaction.render()) # Render the interaction
    return llm_context # Return the LLM context

def call_llm(llm, interaction_stack):
    llm_context = render_context(interaction_stack) # Render the context
    llm_response = llm.call(llm_context) # Call the LLM
    return llm_response # Return the LLM response

Not only does this allow us to change each message in the LLM context dynamically, but it also allows us to be quite smart about pruning and compressing the context to avoid blowing up the LLM context window. If we also allow for the LLM responses, ie the UserMessage, to be dynamically rendered we can be even smarter about it. This can be implemented using Jinja templates or similar templating engines, that then render the messages, both UserMessage and AssistantMessage, at the time of the LLM call.

For example, for each tool definition we can now specify how a call to that tool and the resulting response is rendered into the context and we can base that on the state of the agent. We can then decide that certain tool calls should be removed from the context beyond a certain horizon, or that the response should be compressed. Since the UserMessage (the responses to the LLM) is also dynamically rendered we can compress the responses in a number of ways - for example, we can decide to summarize or dummy out things like large artifacts or other information that is no longer as relevant for the LLM.

In general, dynamic rendering of the context gives us a huge amount of flexibility to steer the agent and to improve its performance, both in terms of token usage, execution time, and consistency.

Planning and dynamically changing an agentā€™s definition

Planning isnā€™t explicitly part of the state machine setup but can be implemented easily within the framework by either delegating the planning to a sub-agent, as described above, or by using the ability to change the definition of an agent dynamically depending on its state.

One way to change the agent definition, which can include changing the system prompt, tool availability or LLM parameters, is to make the changes within the execution of a tool call. For example, in one mode the agent can be tasked with coming up with a plan and at its disposal is has a tool to report this plan. Once it reports its plan, the mode then changes. It now has a different system prompt, different tools and is tasked with executing the plan. Because of the dynamic context rendering the context can be rendered accordingly, meaning parts used in the previous mode can be removed or compressed.

This way of changing the configuration of the agent makes both planning and planning revision easy to implement. It also allows for a lot of flexibility when it comes to planning, including pre-defined templates for the agent to fill out or completely pre-defined plans that the agent can choose from.

Self-Reflection

Self-reflection refers to the agent being asked, or asking itself, if its previous output was good enough or whether there is something it could do to improve it.

In the case of our framework we typically do this in two ways:

  1. The agent is asked at the end of it flow, ie when it reaches the Finished state, whether it is satisfied with its output and whether it wants to improve it or not. In this case the agent can either decide it is good enough or go back to work. If it goes back to work, it can go back to restarting it reasoning flow and continue it from where it left off until it is satisfied with the output.
  2. The agent is asked to reflect on the output of a single tool call. In this case the agent can decide to redo the tool call. This does not start a new reasoning flow but insteadonly a single tool call is redone.

Self-reflection

So how do we handle this in the state machine:

In the first case, called full self-reflection, we configure it on the agent and then we check within the step function of the Finished state, if the agent is enabled for self-reflection or not. If it is, then we push a new UserMessage state onto the stack with a message asking the agent to reflect on its output.

Similarly, in the second case it is configured on the tool definition. We then check within the step function of the ToolResult state, if the tool is enabled for self-reflection or not. If it is, then we push a new UserMessage state onto the stack asking the agent to reflect on the output of the previous tool call.

Very simple, but very powerful.

Sub-agent Reflection

Sub-agent reflection is very similar to self-reflection but in this case it is a separate sub-agent that is used to reflect on the output of the main agent and then feedback to the main agent whether it should redo and improve its output or not.

The way we handle sub-agent reflection is therefore very similar to self-reflection:

  1. In the case of full self-reflection, we now instead push a new AgentCall state onto the stack from the Finished state. Once the sub-agent is finished it returns a AgentResult state onto the stack of the main agent. This state holds the feedback from the sub-agent on the main agentā€™s output. This state will then transition into a UserMessage state, which will be passed to the LLM to then select the next action.

  2. In the case of tool reflection, we also push a new AgentCall state onto the stack from the ToolResult state. Once the sub-agent is finished it returns a AgentResult state onto the stack of the main agent. This state holds the feedback from the sub-agent on the previous tool call. This feedback then triggers a new UserMessage state, which will be passed to the LLM to then select the next action based on the feedback.

Sub-agent reflection

End of Part 2

That is it for this part, in the next part we cover some other nice features of the state machine framework, such as artifacts and memory, rewinding and rerunning and debugging and monitoring.

Finally, in the fourth part we will cover how we use the state machine framework to enable more complex agentic flows, such as MCTS, Roll Outs and Reinforcement Learning for better reasoning.

Ā«Ā Part 1: The state machine | Part 3: Extending the agentĀ Ā»