eriksfunhouse.com

Where the fun never stops!

Using State Machines to Orchestrate Multi-Agent Systems

Part 3 of 4
February 25, 2025 Long

Part 3: Extending the agent

In the last two parts(1, 2) we described the state machine framework, how it works and how we can steer agents within it.

In this third part, we will describe several other desirable extensions of any agent framework and how we can implement them with our state machine setup:

Artifacts and memory

For agents to save their work, each agent and each session holds a store of artifacts. This allows both the main agent to save its work but also for sub-agents and interaction branches to save work to the shared memory of the session. Often sub-agents will save their work to the artifact store and then notify the parent agent of the new artifacts, instead of sending back the entire piece of work.

Artifacts can come in many shapes: text, XML, images, code, PDF etc.

The artifacts are not directly accessible to the LLM, or created by it, but instead created and accessed through tool implementations. This way we expose artifacts to the LLM through lookup tools and allow it to save artifacts through other tool calls. This also allows us to save artifacts completely outside of the context or action space of the LLM, like a PDF rendering of a text reply or an image generated from an XML reply.

Artifacts

Because artifacts are saved through tool calls, it also allows the agent to save results as it is progressing and to emit intermediate results to the user. For example, when running a large piece of data analysis, the agent can output intermediate analysis that the user can see and interact with, before it outputs the final analysis to the user. This allows the user to monitor the agentā€™s progress and see the intermediate reasoning steps, providing for a much more interactive and engaging experience than waiting for the final result. It also allows the users to pause and redirect the agent to a new path of reasoning along way, if it looks like it is not going in the right direction.

Artifacts also represent a form of longer term memory for the agent, not just a way to save results. For memory use, agents can perform RAG type searches across previous artifacts to look up previous work. This is much more efficient and reliable than trying to keep the context of previous interactions in the LLM context or to do look-ups across previous interaction stacks and it allows the agent to use previous results as well as learn from previous work.

In a way, you can view artifacts as an augmentation to the existing knowledge base of the agent. Once an artifact has been created it is a new piece of knowledge added to the existing data that the agent has access to. Borrowing from knowledge graphs, you can view the original data as the hard facts and the artifacts as soft facts, derived from the interpretations and reflections of the agent on the original data.

Single-questions outside of context

Sometimes you just want to ask the agent a single question and get a single answer without any further actions taken by the agent and most often you donā€™t want this to be part of the future reasoning trace.

In this case you can create a new branch with only a single tool available to the agent, which it can use to answer the question. The branch will be created with a UserResponse state to ask the agent a question and get a response. The UserResponse state will push a UserMessage state onto the stack, triggering an LLM call followed by an AssistantMessage state. The single tool call then transitions into a Finished state and the branch is finished. At this point you have your answer as part of the tool call.

Single question

For example, a user might want to clarify a certain aspect of the agentā€™s reply or ask a simple follow up question. You can also use this for simple tasks, like generating a title for the agent analysis, a brief summary of what the agent has done or asking the agent to extract highlights of its work.

Rewinding and re-running from previous states

Because the entire history of an agent is defined by its interaction stack, it is easy to rewind and re-run from previous states. We simply remove the top of the stack down to the state from which we want to rewind and restart the agent to step from there.

Sub-agents

Rewinding the interaction stack to a previous state

This is really useful when debugging, fixing or extending the agent and its tools and sub-agents, since it makes it quick and easy to debug and improve the agent.

For example, a reasoning trace might be 100 steps long and have ended up in a bad place. To rerun from scratch would 1) take a long time and 2) not necessarily expose the same issue again. But with the rewind mechanism and interaction stack, we can now easily analyse and diagnose what happened by looking through the interaction stack. And then once we have found an issue, or a place where we think the agent could have done better, we can tweak the particular tool, or agent configuration, and re-run from the point of failure to test the fix or improvement.

Debugging, Monitoring and Extensibility

As mentioned above, one of the nicest features of the state machine framework is how easy it is to understand, debug and monitor.

When you run the agent, you can easily monitor its state by just monitoring each agentā€™s interaction stack. And when you are testing and debugging, it is easy to pause the agent, inspect its interaction stack and rewind and re-run after a fix.

You can also see the messages to the LLM at any point in time and how they are rendered, which allow you to easily change and optimise the LLM interactions - something is often quite difficult in a complex agent setting.

In my own case, I created a console tool to inspect agents and sub-agents, view artifacts, rewind and re-run and pause and resume agents. Because of the simplicity of the state machine, the console app was a simple tool to implement but incredibly powerful when it comes to both developing and debugging the agents as well as monitoring them when deployed.

Sub-agents

My little console app in action. Here looking at an interaction stack of an agent run a piece of data analysis. As you can see it is already in the 100s of steps.

Another important feature of the state machine framework is its extensibility. The way the types of states are defined makes it easy to extend agents with new abilities, either via new tools or new types of sub-agents.

To extend the capabilities of the agent, you simply have to add a new tool definition to be executed by the ToolCall state or add a new type of agent to be launched by the AgentCall state.

To make this easy to do, you can implement tool and agent definitions using either metaprogramming techniques or a simple DSLs. I have done both using a simple yaml-based DSL and that works really well.

End of Part 3

That is the end of part 3, there is a lot more to be said about the state machine framework, especially in terms of how easy it is to build, maintain and extend with new types of reasoning capabilities, some of which we will cover in Part 4 - in particular advanced features to improve reasoning at test-time compute.

Ā«Ā Part 2: Steering agents | Part 4: Improving test-time reasoning (Coming Soon)Ā Ā»