Architecture Overview
What it is: A single LLM agent equipped with a set of callable tools (functions). The agent receives a user request, reasons about which tool to invoke and with what arguments, executes the tool, observes the result, and repeats this loop until it has enough information to produce a final answer. This is the ReAct (Reasoning + Acting) pattern applied with modern function calling APIs.
When to use it: When a user query cannot be answered by the LLM alone and requires real-world interaction — looking up live data, performing calculations, executing code, querying databases, or calling external APIs. The agent dynamically decides which tools to use and in what order, adapting its plan based on intermediate results.
Why it matters: This architecture transforms an LLM from a passive text generator into an autonomous actor. Instead of returning stale or hallucinated information, the agent retrieves real data and takes verifiable actions. It is the foundation for all more complex agent architectures (multi-agent, orchestration, production platforms).
Complexity: Intermediate. Requires understanding of function calling schemas, the agent loop pattern, error handling for tool failures, and iteration budgets to prevent runaway costs.
Architecture Diagram
Components Deep Dive
Function Calling Schema
Tools are defined as JSON schemas that describe the function name, description, and parameter types. The LLM uses these schemas to decide which tool to invoke and generates structured JSON arguments. High-quality descriptions are critical — they determine whether the model selects the right tool.
Tool Registry
A central mapping of tool names to their implementations. When the LLM requests a tool call, the registry dispatches to the correct function. Production registries include versioning, access control, and health checks for each tool.
Agent Loop (ReAct)
The core observe-think-act cycle. The LLM receives the conversation history (including prior tool results), reasons about the next step, and either calls a tool or produces a final answer. The loop continues until no more tool calls are needed or the iteration limit is reached.
Error Handling
Tools can fail — APIs return errors, calculations overflow, searches find nothing. The agent must gracefully handle failures: retry with modified parameters, try an alternative tool, or report the failure in its response. Unhandled errors should never crash the loop.
Sandboxing
Code execution and database query tools require sandboxing to prevent security vulnerabilities. Use containerized execution environments, read-only database connections, and restricted system calls. Never pass raw user input directly to eval() or shell commands.
Iteration Budget
Set a maximum number of loop iterations (typically 5-15) and a token budget to prevent runaway costs. If the agent cannot solve the task within the budget, return the best partial answer with a disclaimer rather than looping indefinitely.
Implementation
A complete agentic tool-use implementation with three tools, function calling, and a bounded execution loop:
from openai import OpenAI
import json, math
client = OpenAI()
# ── Tool Definitions (JSON Schema for function calling) ──
tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate a mathematical expression safely",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}
}
},
{
"type": "function",
"function": {
"name": "run_python",
"description": "Execute a Python code snippet and return stdout",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string"}
},
"required": ["code"]
}
}
}
]
# ── Tool Implementations ──
def execute_tool(name: str, args: dict) -> str:
try:
if name == "web_search":
return f"Results for '{args['query']}': [simulated search data]"
elif name == "calculator":
# Safe math evaluation (no builtins)
allowed = {"math": math}
result = eval(args["expression"], {"__builtins__": {}}, allowed)
return str(result)
elif name == "run_python":
# In production: use a sandboxed container
import io, contextlib
buf = io.StringIO()
with contextlib.redirect_stdout(buf):
exec(args["code"], {"__builtins__": {}})
return buf.getvalue() or "(no output)"
except Exception as e:
return f"Error: {type(e).__name__}: {e}"
return "Unknown tool"
# ── Agent Loop ──
def run_agent(user_msg: str, max_steps: int = 10) -> str:
messages = [
{"role": "system", "content": "You are an assistant with tools. "
"Use them when needed. Think step by step."},
{"role": "user", "content": user_msg}
]
for step in range(max_steps):
resp = client.chat.completions.create(
model="gpt-4o", messages=messages,
tools=tools, tool_choice="auto"
)
msg = resp.choices[0].message
messages.append(msg)
# No tool calls = final answer
if not msg.tool_calls:
return msg.content
# Execute each tool and feed results back
for tc in msg.tool_calls:
result = execute_tool(
tc.function.name,
json.loads(tc.function.arguments)
)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": result
})
return "Max steps reached — returning partial answer."
# Usage
answer = run_agent("What is 15% of the US population?")
print(answer)
Data Flow
- User sends a request — "What is 15% of the current US population?" enters the agent.
- Agent reasons — The LLM receives the system prompt, tool definitions, and user message. It decides it needs to search for the current US population.
- Tool call emitted — The LLM returns a structured function call:
web_search({"query": "current US population 2024"}). - Tool executed — The dispatcher routes to the web_search implementation, which returns "~335 million."
- Result fed back — The tool result is appended to the conversation history as a
toolmessage. - Agent reasons again — Now the LLM has the population. It decides to call the calculator:
calculator({"expression": "335000000 * 0.15"}). - Second tool executed — Returns "50250000."
- Final answer — The LLM now has all information. It produces a natural language response: "15% of the US population is approximately 50.25 million."
Trade-offs
| Dimension | Pros | Cons |
|---|---|---|
| Flexibility | Agent dynamically selects tools and plans steps at runtime — no hardcoded pipeline | Non-deterministic execution path makes testing and debugging harder |
| Cost | Only calls tools when needed, avoiding unnecessary API usage | Multiple LLM calls per request (3-10x more expensive than a single call) |
| Latency | Can parallelize independent tool calls | Sequential loop adds latency: each iteration = LLM call + tool execution |
| Reliability | Can retry failed tools and adapt its approach | LLM may hallucinate tool arguments or enter infinite loops without proper guardrails |
| Complexity | Simple to implement with modern function calling APIs | Tool sandboxing, error recovery, and budget management add production complexity |
When to use this architecture: When user queries require dynamic, multi-step interaction with external data sources or computation. Not recommended for simple, single-turn Q&A where a direct LLM call or RAG retrieval suffices.