Agents & Multi-Agent Systems — CareerAlign GenAI

The ReAct Pattern

Plain Language

The ReAct pattern (Reasoning + Acting) is the foundational framework for how LLM agents operate. It is inspired by how humans solve problems: we think about what to do, take an action, observe the result, think about what to do next, and repeat until the problem is solved. In the ReAct loop, the LLM alternates between reasoning (thinking through the problem and planning the next step) and acting (executing a tool or taking an action). After each action, the agent observes the result and uses that observation to inform its next reasoning step.

Consider a user asking: "What was the weather like in the city where Apple was founded, on the day the iPhone was announced?" A standard LLM would try to answer from memory and likely get details wrong. A ReAct agent thinks: "I need to know where Apple was founded. Let me search for that." It calls a search tool and learns "Cupertino, California." Then it thinks: "I need the date the iPhone was announced." It searches again and finds "January 9, 2007." Finally, it thinks: "Now I need historical weather for Cupertino on January 9, 2007." It calls a weather API and gets the answer. Each step builds on the previous result, and the agent dynamically plans its path rather than following a fixed script.

What makes ReAct powerful is its flexibility. Unlike traditional programming where you specify every step in advance, a ReAct agent figures out the steps at runtime based on what it discovers along the way. If the first search returns ambiguous results, the agent can rephrase and search again. If an API call fails, it can try an alternative source. If the problem turns out to be simpler than expected, it can skip unnecessary steps. This adaptive behavior makes agents much more robust for real-world tasks where the path to the answer is not known in advance.

The ReAct loop typically runs for 3 to 10 iterations for most tasks. Each iteration involves one LLM call (for reasoning and deciding the next action) and one tool execution. The maximum number of iterations is usually capped to prevent runaway loops — if an agent cannot solve a task in N steps, it is better to return a partial result or ask the user for clarification than to loop indefinitely burning tokens and API costs.

Deep Dive

Here is a minimal ReAct agent built from scratch using the OpenAI function calling API. This implementation shows the core loop without any framework dependencies:

from openai import OpenAI
import json, requests

client = OpenAI()

# --- Tool Definitions ---
tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform mathematical calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression to evaluate"}
                },
                "required": ["expression"]
            }
        }
    }
]

# --- Tool Implementations ---
def execute_tool(name: str, args: dict) -> str:
    if name == "web_search":
        # In production, use a real search API (Tavily, Serper, etc.)
        return f"Search results for '{args['query']}': [simulated results]"
    elif name == "calculator":
        try:
            result = eval(args["expression"])  # Use safe_eval in production!
            return str(result)
        except Exception as e:
            return f"Error: {e}"
    return "Unknown tool"

# --- ReAct Agent Loop ---
def run_agent(user_message: str, max_steps: int = 10) -> str:
    messages = [
        {"role": "system", "content": """You are a helpful assistant with access to tools.
Think step by step. Use tools when you need information you don't have.
When you have enough information, provide a final answer to the user."""},
        {"role": "user", "content": user_message}
    ]

    for step in range(max_steps):
        # Reasoning: LLM decides what to do
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        msg = response.choices[0].message
        messages.append(msg)

        # If no tool calls, the agent is done
        if not msg.tool_calls:
            return msg.content

        # Acting: Execute each tool call
        for tool_call in msg.tool_calls:
            args = json.loads(tool_call.function.arguments)
            print(f"  → {tool_call.function.name}({args})")

            result = execute_tool(tool_call.function.name, args)

            # Observation: Feed result back to the LLM
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "Agent reached maximum steps without completing the task."

# Usage
answer = run_agent("What is 15% of the current US population?")
print(answer)

Figure 1 — The ReAct loop: Reason → Act → Observe, repeating until the task is complete

Tool Use & Function Calling

Plain Language

Tools are the arms and legs of an LLM agent. Without tools, an agent can only think and write text. With tools, it can search the web, query databases, call APIs, execute code, read files, send emails, and interact with virtually any software system. The mechanism that enables this is function calling — a feature where the LLM does not just generate text, but generates structured JSON that specifies which function to call and with what arguments. Your application code then executes that function and feeds the result back to the LLM.

The function calling flow works like this. You provide the LLM with a list of available tools, each described as a JSON schema with a name, description, and parameter definitions. When the LLM decides it needs to use a tool, instead of generating a text response, it generates a special tool call message containing the function name and arguments as JSON. Your code extracts these arguments, calls the actual function, and sends the result back to the LLM as a tool response message. The LLM then continues its reasoning with this new information. Both OpenAI and Anthropic support this pattern natively in their APIs.

The quality of tool descriptions is critical. The LLM decides which tool to use based on the description you provide, and it constructs arguments based on the parameter schema. Vague descriptions like "does stuff with data" will confuse the model. Precise descriptions like "Queries a PostgreSQL database with a SQL SELECT statement and returns results as JSON. Use this when the user asks questions about customer data, orders, or product inventory" give the model clear guidance on when and how to use the tool. Including examples of valid arguments in the description further improves reliability.

Most production agents use between 3 and 10 tools. Too few tools limit what the agent can do; too many tools confuse the model about which one to choose. When you need many capabilities, consider organizing tools into categories and having the agent first select a category, then select a specific tool within that category. This two-stage selection reduces the decision space at each step and improves tool selection accuracy.

Deep Dive

Here is a production-quality tool setup using the Anthropic SDK, which demonstrates the slightly different function calling format compared to OpenAI:

import anthropic
import json

client = anthropic.Anthropic()

# Tool definitions for Anthropic
tools = [
    {
        "name": "query_database",
        "description": """Execute a read-only SQL query against the application database.
Returns results as a JSON array of objects. Only SELECT queries are allowed.
Use for: customer lookups, order history, product searches, analytics.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "sql": {
                    "type": "string",
                    "description": "SQL SELECT query to execute"
                }
            },
            "required": ["sql"]
        }
    },
    {
        "name": "send_email",
        "description": """Send an email to a specified recipient. Requires explicit user
confirmation before sending. Use for: notifications, reports, follow-ups.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "to": {"type": "string", "description": "Recipient email"},
                "subject": {"type": "string"},
                "body": {"type": "string"}
            },
            "required": ["to", "subject", "body"]
        }
    }
]

# Anthropic agent loop
def run_claude_agent(task: str, max_steps: int = 10) -> str:
    messages = [{"role": "user", "content": task}]

    for _ in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            system="You are a helpful assistant. Use tools when needed.",
            tools=tools,
            messages=messages,
        )

        # Check if Claude wants to use tools
        if response.stop_reason == "tool_use":
            # Process tool calls
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # Agent is done — extract text response
            return "".join(
                b.text for b in response.content if hasattr(b, "text")
            )

    return "Max steps reached."

Tool Description Tips

Include: (1) What the tool does, (2) When to use it, (3) Parameter constraints. Example: "Searches the company knowledge base for relevant documents. Use when the user asks about internal policies, procedures, or company-specific information. The query parameter should be a natural language search phrase, not a question."

LangGraph Workflows

Plain Language

While the basic ReAct loop works well for straightforward tasks, complex workflows need more structure. Consider a customer support agent that must: (1) identify the customer, (2) look up their order history, (3) determine the issue type, (4) route to the appropriate resolution workflow (refund, replacement, technical support), and (5) execute the resolution with approval. This is not a simple loop — it is a directed graph with conditional branches, parallel paths, and human approval gates.

LangGraph is a library (from the LangChain team) for building these complex agent workflows as state machines. You define nodes (functions that do work), edges (connections between nodes), and conditional edges (branches based on state). The state flows through the graph, getting modified at each node, with the graph engine handling execution order, error recovery, and checkpointing. Think of it like a flowchart that actually executes — each box in the flowchart is a function, and the arrows between boxes include conditions that determine the path.

LangGraph's key advantage over simple chains is cycles — the ability to loop back to earlier nodes. In the self-corrective RAG example from Module 08, the graph loops from "grade documents" back to "retrieve" when quality is insufficient. This is impossible with simple sequential chains but natural with LangGraph's graph-based execution model. Combined with state persistence (checkpointing), LangGraph can pause execution, wait for external input (like human approval), and resume exactly where it left off — even if the server restarts.

Deep Dive

Here is a complete LangGraph agent that implements a research assistant with conditional routing:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    task: str
    plan: list[str]
    results: Annotated[list[str], add]  # Accumulates across nodes
    current_step: int
    final_answer: str

def planner(state: AgentState) -> AgentState:
    """Break task into sub-steps using LLM."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": f"""Break this task into 2-4 research steps:
Task: {state['task']}
Return as JSON array under key "steps"."""
        }],
        response_format={"type": "json_object"}
    )
    steps = json.loads(response.choices[0].message.content)["steps"]
    return {"plan": steps, "current_step": 0}

def researcher(state: AgentState) -> AgentState:
    """Execute current research step."""
    step = state["plan"][state["current_step"]]
    # In production: call search API, RAG, database, etc.
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Research: {step}"}]
    )
    result = response.choices[0].message.content
    return {"results": [result], "current_step": state["current_step"] + 1}

def should_continue(state: AgentState) -> str:
    """Check if more research steps remain."""
    if state["current_step"] < len(state["plan"]):
        return "research"
    return "synthesize"

def synthesizer(state: AgentState) -> AgentState:
    """Combine all research results into a final answer."""
    all_results = "\n\n".join(state["results"])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": f"""Synthesize these research findings into a comprehensive answer.
Task: {state['task']}
Findings:\n{all_results}"""
        }]
    )
    return {"final_answer": response.choices[0].message.content}

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("planner", planner)
graph.add_node("researcher", researcher)
graph.add_node("synthesizer", synthesizer)

graph.set_entry_point("planner")
graph.add_edge("planner", "researcher")
graph.add_conditional_edges("researcher", should_continue, {
    "research": "researcher",     # Loop back for more steps
    "synthesize": "synthesizer"  # Move to synthesis
})
graph.add_edge("synthesizer", END)

# Compile with checkpointing
memory = MemorySaver()
app = graph.compile(checkpointer=memory)

# Run
result = app.invoke(
    {"task": "Compare vLLM and TGI for production deployment",
     "plan": [], "results": [], "current_step": 0, "final_answer": ""},
    config={"configurable": {"thread_id": "research-1"}}
)
print(result["final_answer"])

Multi-Agent Systems

Plain Language

A single agent with many tools becomes unreliable as complexity grows — the LLM must simultaneously understand all tools, maintain context about the overall task, and make good decisions at each step. Multi-agent systems solve this by splitting responsibilities across specialized agents that each excel at one thing. Think of it like a company: instead of one person doing sales, engineering, and accounting, you have specialists in each area who collaborate to run the business.

The most common multi-agent patterns are supervisor (one agent delegates to specialists), sequential (agents pass work in a pipeline), and collaborative (agents discuss and refine work together). In the supervisor pattern, a "manager" agent receives the user's request, breaks it down, and delegates sub-tasks to specialist agents — a researcher, a coder, a writer, an analyst. Each specialist has its own tools and system prompt optimized for its domain. The supervisor collects results and synthesizes the final output.

The sequential pattern works like an assembly line. A "planner" agent creates a task breakdown, a "researcher" agent gathers information, a "writer" agent drafts content, and a "reviewer" agent checks quality. Each agent's output becomes the next agent's input. If the reviewer finds issues, it can send work back to the writer for revision. This pattern is excellent for content creation, report generation, and any workflow with clear stages.

The collaborative pattern (inspired by the AutoGen framework) has agents engage in a conversation, debating approaches and refining ideas. A "proposer" agent suggests a solution, a "critic" agent identifies weaknesses, and the proposer revises based on the critique. This back-and-forth continues until the solution passes the critic's review. This pattern is powerful for code generation, where a "coder" writes code and a "tester" writes and runs tests, iterating until all tests pass.

Deep Dive

Here is a multi-agent system using the supervisor pattern with LangGraph:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal

class TeamState(TypedDict):
    task: str
    messages: list[dict]
    next_agent: str
    research: str
    code: str
    review: str
    final_output: str

def supervisor(state: TeamState) -> TeamState:
    """Supervisor decides which agent should work next."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": """You are a team supervisor. Based on the current task state,
decide which agent should work next:
- "researcher": Needs information gathering
- "coder": Needs code written or modified
- "reviewer": Code is ready for quality review
- "FINISH": Task is complete

Return only the agent name."""
        }, {
            "role": "user",
            "content": f"""Task: {state['task']}
Research: {state.get('research', 'Not done')}
Code: {state.get('code', 'Not written')}
Review: {state.get('review', 'Not reviewed')}"""
        }],
        temperature=0.0
    )
    next_agent = response.choices[0].message.content.strip().lower()
    return {"next_agent": next_agent}

def researcher_agent(state: TeamState) -> TeamState:
    """Specialist: researches technical topics."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": "You are a technical researcher. Provide thorough research findings."
        }, {
            "role": "user",
            "content": f"Research this topic: {state['task']}"
        }]
    )
    return {"research": response.choices[0].message.content}

def coder_agent(state: TeamState) -> TeamState:
    """Specialist: writes production code."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": "You are an expert Python developer. Write clean, production-ready code."
        }, {
            "role": "user",
            "content": f"Task: {state['task']}\nResearch: {state['research']}\nWrite the code."
        }]
    )
    return {"code": response.choices[0].message.content}

def reviewer_agent(state: TeamState) -> TeamState:
    """Specialist: reviews code for quality and correctness."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": "You are a senior code reviewer. Check for bugs, security issues, and suggest improvements."
        }, {
            "role": "user",
            "content": f"Review this code:\n{state['code']}"
        }]
    )
    return {"review": response.choices[0].message.content}

def route(state: TeamState) -> str:
    agent = state["next_agent"]
    if agent == "finish":
        return END
    return agent

# Build multi-agent graph
workflow = StateGraph(TeamState)
workflow.add_node("supervisor", supervisor)
workflow.add_node("researcher", researcher_agent)
workflow.add_node("coder", coder_agent)
workflow.add_node("reviewer", reviewer_agent)

workflow.set_entry_point("supervisor")
workflow.add_conditional_edges("supervisor", route, {
    "researcher": "researcher",
    "coder": "coder",
    "reviewer": "reviewer",
    END: END,
})

# All agents report back to supervisor
for agent in ["researcher", "coder", "reviewer"]:
    workflow.add_edge(agent, "supervisor")

team = workflow.compile()

Figure 2 — Supervisor multi-agent pattern: manager routes tasks to specialist agents

Human-in-the-Loop

Plain Language

Fully autonomous agents are powerful but risky. An agent that can send emails, modify databases, or deploy code could cause significant damage if it makes a wrong decision. Human-in-the-loop (HITL) patterns add checkpoints where the agent pauses and asks for human approval before taking high-impact actions. Think of it like a pilot and autopilot — the autopilot handles routine operations, but the pilot takes over for critical decisions like landing.

The key design decision is where to insert approval gates. Not every action needs human approval — that would make the agent painfully slow. The principle is to gate actions based on their reversibility and blast radius. Reading data? No approval needed. Writing a draft email? Auto-approve or light review. Sending that email to 10,000 customers? Definitely needs approval. Modifying production database records? Hard stop for human review. The goal is to maintain agent speed for safe operations while providing safety rails for dangerous ones.

LangGraph supports HITL natively through its interrupt mechanism. When a graph reaches an interrupt point, it saves its complete state (all variables, the current position in the graph, pending actions) to persistent storage and pauses. An external system (a web UI, Slack bot, or CLI) presents the pending action to a human reviewer. The human can approve (resume the graph), reject (terminate or redirect), or modify (change the pending action). The graph then continues from exactly where it paused, with the human's decision incorporated into its state.

Deep Dive

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict

class ApprovalState(TypedDict):
    task: str
    proposed_action: str
    approved: bool
    result: str

def plan_action(state: ApprovalState) -> ApprovalState:
    """Agent plans an action that requires approval."""
    # LLM decides what action to take
    return {"proposed_action": f"Send email to customer about: {state['task']}"}

def execute_action(state: ApprovalState) -> ApprovalState:
    """Execute the approved action."""
    if state["approved"]:
        return {"result": f"Executed: {state['proposed_action']}"}
    return {"result": "Action was rejected by human reviewer."}

graph = StateGraph(ApprovalState)
graph.add_node("plan", plan_action)
graph.add_node("execute", execute_action)

graph.set_entry_point("plan")
graph.add_edge("plan", "execute")
graph.add_edge("execute", END)

# Compile with interrupt BEFORE the execute node
checkpointer = MemorySaver()
app = graph.compile(
    checkpointer=checkpointer,
    interrupt_before=["execute"]  # Pause here for approval
)

# Run until interrupt
config = {"configurable": {"thread_id": "approval-1"}}
state = app.invoke(
    {"task": "refund order #12345", "approved": False, "proposed_action": "", "result": ""},
    config=config
)
print(f"Proposed: {state['proposed_action']}")

# Human reviews and approves...
# Resume with approval
app.update_state(config, {"approved": True})
final = app.invoke(None, config=config)  # Resume from checkpoint
print(f"Result: {final['result']}")

Approval Gate Guidelines

Always gate: sending messages, modifying production data, financial transactions, deploying code. Light review: draft content, non-production data changes. No gate needed: reading data, internal computations, search queries, draft generation.

Agent Safety & Reliability

Plain Language

Agents introduce unique safety challenges that do not exist with simple LLM calls. A chatbot that generates incorrect text is annoying; an agent that executes incorrect actions can delete data, send embarrassing emails, or burn through your entire cloud budget. Safety for agents means: limiting what tools they can access, constraining the scope of each tool, implementing timeouts and cost limits, logging every action for audit, and having kill switches to stop runaway agents.

The principle of least privilege applies directly to agent tools. A database tool should only have read access unless write access is specifically needed. An email tool should have a whitelist of allowed recipients during development. A code execution tool should run in a sandboxed environment with no network access. Each tool should validate its inputs against a strict schema and reject anything outside expected parameters. Defense in depth — multiple layers of safety checks — is essential because any single check might fail.

Observability is non-negotiable for production agents. Every tool call, every LLM reasoning step, every decision branch must be logged with timestamps, input/output data, and latency metrics. When an agent produces a wrong answer or takes an incorrect action, you need to trace back through the entire execution to understand what went wrong. Tools like LangSmith, Phoenix (Arize), and custom structured logging provide this visibility. Without it, debugging agent failures is essentially impossible.

Deep Dive

import time, logging
from functools import wraps

logger = logging.getLogger("agent")

# --- Safety Wrapper for Tools ---
def safe_tool(max_retries: int = 2, timeout_seconds: int = 30):
    """Decorator that adds safety features to agent tools."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries + 1):
                start = time.perf_counter()
                try:
                    result = func(*args, **kwargs)
                    elapsed = time.perf_counter() - start

                    logger.info(f"Tool={func.__name__} status=success "
                        f"elapsed={elapsed:.2f}s attempt={attempt+1}")

                    if elapsed > timeout_seconds:
                        logger.warning(f"Tool {func.__name__} exceeded timeout")

                    return result
                except Exception as e:
                    logger.error(f"Tool={func.__name__} error={e} attempt={attempt+1}")
                    if attempt == max_retries:
                        return f"Error after {max_retries+1} attempts: {e}"
        return wrapper
    return decorator

# --- Cost Tracking ---
class AgentBudget:
    """Track and limit agent spending."""
    def __init__(self, max_llm_calls: int = 20, max_tokens: int = 50_000):
        self.max_llm_calls = max_llm_calls
        self.max_tokens = max_tokens
        self.llm_calls = 0
        self.total_tokens = 0

    def track(self, tokens_used: int):
        self.llm_calls += 1
        self.total_tokens += tokens_used
        if self.llm_calls > self.max_llm_calls:
            raise RuntimeError(f"Agent exceeded max LLM calls: {self.max_llm_calls}")
        if self.total_tokens > self.max_tokens:
            raise RuntimeError(f"Agent exceeded token budget: {self.max_tokens}")

Safety Layer	What It Prevents	Implementation
Max iterations	Infinite loops	Counter in agent loop
Token budget	Runaway costs	Token counter + hard limit
Tool timeouts	Hung tool calls	asyncio.timeout or threading
Input validation	Injection attacks	Pydantic schemas per tool
Output filtering	PII/sensitive data leaks	Regex + LLM guardrail check
Audit logging	Untrackable actions	Structured logs per step
Human approval	Irreversible actions	LangGraph interrupt_before

🎯

Interview Ready

How to Explain This in 2 Minutes

Elevator Pitch

Agents are the leap from LLMs that answer questions to LLMs that take actions. A basic LLM generates one response and stops, but an agent follows the ReAct loop — Reason about the task, Act by calling a tool (an API, a database query, a code interpreter), Observe the result, and repeat until the goal is met. Tool use is implemented through function calling: you declare tool schemas in the API request, the model returns a structured JSON call instead of plain text, your code executes the tool and feeds the result back, and the model continues reasoning. For complex workflows with branching, cycles, and state, LangGraph models the agent as a stateful graph where nodes are processing steps and edges are conditional transitions. When a single agent is not enough, multi-agent architectures let specialized agents — a planner, a researcher, a coder, a reviewer — collaborate through message passing or a shared state, orchestrated by a supervisor agent. Human-in-the-loop checkpoints ensure that high-stakes actions (sending emails, modifying databases, approving transactions) require explicit human approval before execution. Safety comes from sandboxing tool execution, setting token and iteration budgets, and implementing structured audit logging at every step.

Likely Interview Questions

Question	What They're Really Asking
Walk me through the ReAct pattern and how it differs from a single-shot LLM call.	Do you understand the Reason-Act-Observe loop and why iterative tool use produces better results than one-pass generation?
How does function calling work at the API level, and how do you define tool schemas?	Can you explain the mechanics — JSON schema declarations, the model returning tool call objects, your code executing and returning results — not just the concept?
When would you use LangGraph over a simple chain, and how do you model conditional workflows?	Do you know when a DAG or cyclic graph is needed, and can you reason about nodes, edges, state, and conditional routing?
How do you design a multi-agent system, and how do agents communicate?	Can you articulate supervisor vs. peer-to-peer topologies, message passing patterns, and when to split responsibilities across agents?
What safety mechanisms would you implement for an agent that can take real-world actions?	Do you understand sandboxing, human-in-the-loop approval gates, token/iteration budgets, audit logging, and how to prevent runaway agent behavior?

Model Answers

1. The ReAct pattern — ReAct (Reasoning + Acting) interleaves chain-of-thought reasoning with tool execution. In a single-shot call, the LLM generates one response from its training data and has no way to verify facts or gather new information. In ReAct, the model first produces a Thought explaining its reasoning and what it needs to find out, then generates an Action specifying which tool to call with what arguments, the system executes the tool and returns an Observation, and the model incorporates this new evidence into its next Thought. This loop repeats until the model has enough information to produce a final answer. The key insight is that reasoning guides better tool selection, and tool results ground the reasoning in real data — each reinforces the other. In practice, you implement this by providing tool schemas in the system prompt or API tools parameter, parsing the model's structured output to detect tool calls, executing them, and appending the results as new messages in the conversation history.

2. Function calling mechanics — Function calling is the API-level mechanism that makes tool use reliable. You declare tools as JSON Schema objects in the API request — each tool has a name, description, and a parameters schema defining the expected arguments with types and constraints. When the model decides to use a tool, instead of returning a plain text response, it returns a message with tool_calls — an array of objects each containing the tool name and arguments as structured JSON. Your application code parses these tool calls, executes the corresponding functions (API calls, database queries, calculations), and sends the results back as tool-role messages. The model then continues generating, either calling another tool or producing the final response. This structured approach is far more reliable than prompting the model to output tool calls as text, because the API enforces valid JSON and the schema constrains the arguments to valid types and values.

# Function calling with OpenAI API
import openai, json

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

# Agent loop
messages = [{"role": "user", "content": "Weather in Tokyo?"}]
response = openai.chat.completions.create(
    model="gpt-4", messages=messages, tools=tools
)

# Model returns tool_call → execute → feed back
tool_call = response.choices[0].message.tool_calls[0]
result = get_weather(json.loads(tool_call.function.arguments))
messages.append(response.choices[0].message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})

3. LangGraph vs simple chains — A simple LangChain chain is a linear sequence: input goes through step A, then B, then C, then output. LangGraph is needed when your workflow has conditional branching (take different paths based on classification), cycles (retry or refine until quality is sufficient), parallel execution (run multiple tools simultaneously), or persistent state that accumulates across steps. In LangGraph, you define a StateGraph with a typed state schema, add nodes (Python functions that read and update state), and connect them with edges — including conditional edges that route to different nodes based on state values. The graph compiles into a runnable that manages state transitions automatically. A common pattern is a router node that classifies the user's intent and routes to specialized subgraphs, with a quality-check node that loops back to retry if the output does not meet a threshold. LangGraph also provides built-in checkpointing for persistence and interrupt_before / interrupt_after for human-in-the-loop approval gates.

4. Multi-agent system design — In a multi-agent system, you decompose a complex task into roles and assign each role to a specialized agent with its own system prompt, tools, and model configuration. The two main topologies are supervisor-based (a coordinator agent delegates subtasks to worker agents and synthesizes their outputs) and peer-to-peer (agents communicate directly through a shared message bus or state object). For example, a research report system might have a Planner agent that breaks the query into sub-questions, a Researcher agent with web search tools, a Writer agent that drafts sections, and a Reviewer agent that critiques and requests revisions. Communication happens through a shared state in LangGraph — each agent reads the current state, performs its work, and writes results back. The supervisor pattern is easier to debug because there is a single control flow, while peer-to-peer scales better for loosely coupled tasks. The key design decision is granularity: too many agents add coordination overhead and latency; too few lose the specialization benefit.

5. Agent safety mechanisms — Safety for action-taking agents requires multiple layers. First, sandboxing: tool execution should happen in isolated environments (Docker containers, serverless functions) so a malicious or buggy tool call cannot access the host system. Second, human-in-the-loop gates: use LangGraph's interrupt_before to pause execution before high-stakes actions (sending emails, modifying production databases, making purchases) and require explicit human approval. Third, budget controls: set maximum token usage, maximum number of iterations (typically 5–15 for most tasks), and wall-clock timeouts to prevent runaway loops where the agent keeps calling tools without converging. Fourth, structured audit logging: log every thought, action, tool call, tool result, and decision at each step with timestamps and trace IDs, so you can replay and debug any agent run. Fifth, least-privilege tool access: each agent should only have access to the specific tools it needs, with scoped API keys and permissions — a research agent should not have write access to production databases.

System Design Scenario

Design Challenge

You are building a multi-agent customer support system for an e-commerce platform handling 10,000 tickets per day. The system must classify incoming tickets, retrieve relevant order and product information, draft responses, escalate complex issues to human agents, and learn from human corrections. Some actions are irreversible (issuing refunds, canceling orders). Design the complete agent architecture.

A strong answer should cover:

Agent topology — a supervisor/router agent that classifies ticket intent (order status, refund request, product question, complaint) and delegates to specialized worker agents, each with tailored system prompts and tool access scoped to their role
Tool integration — function calling schemas for order lookup, refund processing, inventory checking, and CRM updates, with each tool returning structured JSON that the agent can reason about and cite in its response
Human-in-the-loop gates — LangGraph interrupt_before on all irreversible actions (refunds over $100, order cancellations, account modifications), with a review queue that presents the agent's reasoning and proposed action to a human supervisor for approval
Safety and guardrails — iteration limits (max 8 tool calls per ticket), token budgets, sandboxed tool execution, PII redaction in logs, and a confidence threshold below which tickets are automatically escalated to human agents rather than auto-resolved
Feedback loop — logging human corrections and escalation patterns to identify weak spots, using these corrections to refine system prompts and add new few-shot examples, and tracking resolution rate, escalation rate, and customer satisfaction as key metrics

Common Mistakes

Building an agent when a simple chain would suffice — Agents add complexity, latency, and cost. If your task is a linear pipeline (classify, retrieve, generate), use a chain. Agents are only justified when the task requires dynamic decision-making, variable numbers of tool calls, or iterative refinement. The overhead of the ReAct loop — multiple LLM calls per request — can increase latency by 3-5x and cost by 5-10x compared to a single call.
Not setting iteration and token budgets — Without explicit limits, an agent can enter infinite loops — for example, repeatedly calling a search tool with slightly different queries because it is never satisfied with the results. Always set a maximum iteration count (e.g., 10), a maximum token budget, and a wall-clock timeout. When the budget is exhausted, the agent should return the best answer it has so far with an explicit disclaimer, not silently fail or crash.
Giving agents unrestricted tool access without human approval gates — An agent with write access to production systems and no approval checkpoints is a liability. A single hallucinated tool call argument — wrong customer ID, wrong refund amount, wrong email recipient — can cause real damage. Every irreversible action should require human approval in production, and all tool executions should be logged with full arguments and results for audit and rollback.

← Previous

08 · Advanced RAG

10 · Evaluation