Autonomous agents can be thought of as language model-powered bots that can break down complex problems and iteratively solve them, taking action on users’ behalf.
We can use a simple example to illustrate what is possible with just an LLM, an agent, and with autonomous agents.
- With just an LLM, we can look up the best restaurants in a given city.
- With an agent, we can tell it to look up the highest rated restaurant with a table available and book the table for two.
- With an autonomous agent, we can ask it to find the best restaurant that fits into my schedule and my preferences then book it for me and my best friend. Autonomous agents can do this by breaking down a task into subtasks and using memory between each step to guide the agent’s actions.
### Hurdles to overcome for large scale adoption
1. **Logical reasoning != good execution:** In principle GPT-4 is capable of chain-of-thought reasoning and decomposing tasks into multi-step processes. But in practice, the agent often struggles with executing on their own sub-tasks. They struggle to know when to “take a step back” leading to getting stuck doing the same task in a loop, or may hallucinate a step and get stuck because there is little external feedback.
2. **Compute costs:** The architecture of these applications rely on [[recursive loops]], which can lead to many repetitive calls of your LLM. The cost is relatively low per call today with tools like OpenAI’s APIs (but may run into API limits!), but with in-house models, the cost equation may be quite different.
3. **Learning:** Because the autonomous agents are spun up and not subsequently reused, they do not learn from the prompts or from prior attempts, and don’t learn much from their mistakes. Services that help agents persist are on the horizon, though, which should make managing them easier.
If we can solve some of these challenges, we can imagine a future of **“agent to agent” interactions**. Specialized agents could be created for common tasks. Instead of spinning up a new agent for every task, you might “outsource” some of the steps and rely on pre-trained agents to fulfill tasks that you pay for per output, and incorporate those outputs as input into your next steps. In other words, your AI could hire or outsource to another AI. “Core” tasks could be covered by different agents, and a new layer of tooling could emerge as the “glue” stitching the entire process together.
### Next generation of autonomous agents
- **Compute aware:** minimizing resource usage as an objective function
- **Data aware:** finding and connecting to the right model or data source for the task
- **Agent aware:** finding, reusing and communicating with ecosystems of agents
- **Safety aware:** checking outputs and sandboxing code is the first step, plus more serious controls will be needed to prevent abuse
- **User aware:** learning from user behavior and preferences to optimize performance