More Thoughts on Agents

January 29, 2025 Long

Recently Yoav Goldberg asked a series of interesting question on Bluesky(1, 2, 3).

The theme of which was: How do you define an LLM agent?

It made me think about what an LLM agent is and why this time agents might be different from previous agent frameworks. After all agents have been around for a long time, so why all the hype now?

So here are some of my musings on the topic - all of which will probably look incredibly dated very soon, but never the less - here goes!

Just to illustrate the above point about how long agents have been around on computing, here is a nice technical report from 1996 giving an overview of Software Agents: “Software Agents: An Overview” from where the following great quote comes:

We have as much chance of agreeing on a consensus definition for the word agent as AI researchers have of arriving at one for artificial intelligence itself - nil!

What is an agentic LLM system?

At the high level, I would say that it is an autonomous system with the agency to act (the agent part) that uses (but not exclusively) LLMs to decide on the next action.

How is that different from a rule-based agent?

At a local level, the obvious difference is that it uses an LLM for selecting next action and for evaluating the response of a action.

At a more macro-level, the difference is the semantic definition of the objective of the agent versus an explicitly defined objective and reward function.

When is that an advantage?

I would say, if the objective is ill-defined, ie difficult to explitely define, if the response is unstructured, or at least difficult to interpret, and the action space is infinite and generative in its nature, then it is a good fit.

All in all, the interesting differences to me from other ways of building agents are:

The overall goal that the LLM is optimising for when picking next action is given semantically through user instructions versus an explicit mathematical definition
The reward function is also not explicitly defined but interpreted by the LLM from the user’s instructions
The generative element of the LLM enables it to handle an infinite action space really well
The pre-training of the LLM means it has learned a lot about about how to problem solve a lot of common human tasks

To me this makes them great at solving tasks where the outcome is a bit whishy-washy, so there is no real way to create an explicitly defined objective or reward function.

An example of this would be coding agents:

The user provides a high-level description of the program, or code, they want the agent to write
The agents interprets this semantic definition of its goal to select next action, evaluate the output of each action and to determine the overall plan and flow of its actions
Each action involves the agent generating an output from an infinite space (all possible code)
The LLM is trained on large amounts of relevant data and is therefore really good at solving tasks that involving writing code

Dichotomy of agents

I think there are two very distinct types of agentic work. One that improves on current technology and will (most likely) lead to produtivity gains. And then one that may potentially transform how we do, not just knowledge work, but science in general and which I think will be transformative to us all through the work that will become possible across sciences as a result.

Type 1: The automation of process workflows

This is the type of agentic work that is currently getting most of the hype, which makes sense - it is the most obvious use case and well fitted for the first iteration of agents. In this group we find agents for automating customer support, email messaging, data entry, navigating websites, information retrieval, summarisation and the likes. These are what sometimes might be referred to as System 1 thinking - I call it “can it be performed by your average graduate given a simple set of instructions”.

These are the kind of tasks that have been possible to automate with computers for 10+ years with RPA platforms, webscrapers and the likes, but have now become significantly easier to setup and perform. These typically have pre-defined semi-static flows: do 1, then 2, then if X do 3 or 4 etc. As with human work, you need the agent to follow the process and not deviate too much or show too much creativity or initiative.

The main inflection point, or unlock, for this type of agentic work has been the ability to more easily parse and understand unstructured inputs and produce generative outputs.

Type 2: Knowledge work

Knowledge work includes work that require human reasoning and creativity, combined with research and analysis. For example, advanced data analysis, work performed by management consultants, complex software engineering etc.

This is what is sometimes referred to as System 2 thinking and they are tasks that could previously not be done by computers. Tasks that involved higher level reasoning, dynamic flows, and the ability to handle uncertainty and partial information.

The inflection point for this type of work was the ability to handle semantically given objectives and the ability to reason through a flow of actions to an output.

Type 1 vs Type 2

The distinction between Type 1 and Type 2 isn’t just in degrees of difficulty from a human perspective but even more fundamental properties - which I think are general properties to consider about any automation of human work:

Are errors okay?

In knowlege work errors are not just okay but encouraged - you want to try many variations, different solutions, and then pick the best one. Variability and creativity is desired. In process driven tasks, however, it is the opposite - you can’t try many times and pick the best one, insteadyou have to be good every time, with low variability and error rates. For example, the cost of erroring can be high if you are dealing with customer support requests.

Is speed important?

This is as much a volume as a speed question. Doing knowledge work is often low-volume and you are happy for it to take hours, days, weeks, months if the quality of the output is worth it. In data processing tasks, however, it is all about speed: 1000s of emails need to be send, 100s of customers need to be serviced and so on - so in this case volume > quality. This again, points to the need for less creativity and more rule-following for this type of agent.

Is the baseline cheap, fast and readily available?

This point is related to the previous one. For Type 1 work the human baseline is relatively cheap, okay at the job, fast enough and easy to hire and fire. For a digital solution to replace the human in this case it must be better at the job, cheaper, faster and easier to hire, maintain and fire.

An example where this equation hasn’t work out so far is kitchen automation in restaurants. Although robots can do a much better and more consistent job than humans, they are also much more expensive, difficult to hire, maintain and fire, impossible to retrain etc. Here the robots are the expensive luxury choice over humans - humans are the cheaper, slopier alternative that in the aggregate is still the preferred solution.

An example where the equation for Type 1 agents seems to work out is digital process tasks, like AI SDRs or customer support agents. Here the human baseline is readily available and relatively checap, but because of the purely digital nature of the task the speed and scale of the agentic solution massively outweighs the human baseline. You can easily scale the number of agents to meet demand.

With Type 2 work, the equation almost always works in the favour of the digital solution - if it is available. In most cases the human baseline is scarse, expensive and difficult to scale. The problem is often not that the human baseline is a better option, but that the agentic solution is not available.

Consequences of automating knowledge work

I think there are three main consequences of automating knowledge work that will lead to a fundamental shift in how it is done:

Enables infinite scale

We will no longer be constrained by human resources, we can create and test as many hypothesis and run as many pieces of analysis as we like

Wasted work is okay

Scale and low-cost, means you can try any odd piece of analysis and if it doesn’t work out that is okay

Everything becomes backend - UX changes completely

With agents performing most, if not all, of the work the role of the frontend and the human interface changes completely. It is no longer about doing the work, but presenting the results

What makes a good agent

Finally a random collection of thoughts on some of the conditions that makes a agent work well. These are all just little notes from recent experience :)

What makes a good agent:

Less prompting, more tooling - rely less on the generative outputs more on reasoning and understanding
Prefer tools for information retrieval over context - let the agent decide what information is important
The more well-defined, well-behaved and robust the tools are the more stable and convergent the global agent behavior is
In general, applying good software practices when building tools reduces variability and instability in the agent!
Have an objective scoring function for agent actions. Aim for statistical measures on models combined with some heuristics. In the RL sense the agent should have a well-defined reward function. This scoring allow the agents to efficiently search for a converging solution through chained actions - locally unstable, but globally stable.
The generative components of the action space should be from a well-learned domain, like coding or data analysis.

That is it for these ramblings - hopefully next post about agents will be more coherent!

eriksfunhouse.com

Where the fun never stops!