What Is an Agent? The ReAct Loop Explained


You ask an AI assistant: “Find the cheapest flight from NYC to London next Friday and book it.” A standard chatbot gives you a paragraph about how to search for flights. An agent actually does it: it reasons that it needs to search flights, calls a flight search API, examines the results, identifies the cheapest option, reasons that it should book it, calls the booking API, and returns a confirmation.

The difference is not intelligence - it is architecture. A chatbot generates text. An agent generates text, interprets that text as actions, executes those actions, observes results, and generates more text to decide what to do next. It is a loop of reasoning and acting - the ReAct pattern.

Every agentic system you have heard of - GitHub Copilot agent mode, ChatGPT with tools, Claude with computer use, AutoGPT - runs some variant of this loop. Understanding it is the foundation for building reliable agentic applications.

What an agent actually is

An agent is an LLM wrapped in a loop that gives it the ability to:

  1. Reason about what to do next (text generation)
  2. Act by calling external tools (APIs, databases, code execution)
  3. Observe the results of those actions
  4. Repeat until the task is complete or a stopping condition is met

The key distinction from a simple LLM call: agents maintain state across multiple steps and can take actions in the real world. A single LLM call is stateless - input in, output out. An agent persists through multiple calls, accumulating context and making progress toward a goal.

graph TD
  START["User Task"] --> THINK["Reason
(What should I do next?)"]
  THINK --> ACT["Act
(Call a tool/API)"]
  ACT --> OBS["Observe
(Read the result)"]
  OBS --> CHECK{"Task complete?"}
  CHECK -->|"No"| THINK
  CHECK -->|"Yes"| DONE["Return Final Answer"]

  style START fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style THINK fill:#E1F5EE,stroke:#0F6E56,color:#085041
  style ACT fill:#FAEEDA,stroke:#854F0B,color:#633806
  style OBS fill:#F1EFE8,stroke:#888780,color:#444441
  style DONE fill:#EEEDFE,stroke:#534AB7,color:#3C3489

The ReAct pattern

ReAct (Reasoning + Acting) is the foundational agent loop, introduced by Yao et al. (2022). The model alternates between:

Thought: The model reasons about the current state and what action to take next. Action: The model specifies a tool call with parameters. Observation: The tool executes and returns a result.

User: What is the population of the capital of France?

Thought: I need to find the capital of France first, then look up its population.
Action: search("capital of France")
Observation: The capital of France is Paris.

Thought: Now I know the capital is Paris. I need to find Paris's population.
Action: search("population of Paris 2024")
Observation: The population of Paris is approximately 2.1 million in the city proper.

Thought: I have the answer. The capital of France is Paris with a population of 2.1 million.
Action: finish("The population of Paris, the capital of France, is approximately 2.1 million.")

The model does not just answer from memory - it breaks the problem into steps, uses tools to gather information, and synthesizes the result. Each step is conditioned on all previous steps.

Why the loop matters

Decomposition of complex tasks

A single LLM call handles simple questions. But “research competitors, summarize their pricing, and draft a comparison table” requires multiple information-gathering steps, each dependent on previous results. The loop lets the model plan and execute multi-step tasks.

Error recovery

If a tool call fails or returns unexpected results, the model can reason about the failure and try a different approach:

Thought: The API returned an error. Let me try a different search query.
Action: search("Paris France population census 2024")

Grounding in real data

Instead of hallucinating answers, the agent retrieves real data through tool calls. Each fact in the final answer traces back to a specific observation from a tool.

The architecture

graph TD
  subgraph agent["Agent Architecture"]
      LLM["LLM
(reasoning engine)"]
      TOOLS["Tool Registry
(available actions)"]
      MEM["Context/Memory
(accumulated observations)"]
      ORCH["Orchestrator
(manages the loop)"]
  end
  
  ORCH -->|"prompt with history"| LLM
  LLM -->|"thought + action"| ORCH
  ORCH -->|"execute tool"| TOOLS
  TOOLS -->|"observation"| ORCH
  ORCH -->|"append to context"| MEM
  MEM -->|"included in prompt"| ORCH

  style LLM fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style TOOLS fill:#E1F5EE,stroke:#0F6E56,color:#085041
  style MEM fill:#FAEEDA,stroke:#854F0B,color:#633806
  style ORCH fill:#F1EFE8,stroke:#888780,color:#444441

The orchestrator

The orchestrator manages the ReAct loop:

def agent_loop(task, tools, max_steps=10):
    messages = [{"role": "system", "content": system_prompt}]
    messages.append({"role": "user", "content": task})
    
    for step in range(max_steps):
        # Reason: get model's next action
        response = llm.generate(messages, tools=tools)
        
        # Check if done
        if response.finish_reason == "stop":
            return response.content  # Final answer
        
        # Act: execute tool call
        tool_call = response.tool_calls[0]
        result = execute_tool(tool_call.name, tool_call.arguments)
        
        # Observe: add result to context
        messages.append({"role": "assistant", "content": response})
        messages.append({"role": "tool", "content": result})
    
    return "Max steps reached without resolution"

Tool definitions

Tools are functions the agent can call, defined with descriptions and parameter schemas:

tools = [
    {
        "name": "search_web",
        "description": "Search the web for current information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            }
        }
    },
    {
        "name": "run_code",
        "description": "Execute Python code and return the output",
        "parameters": {
            "type": "object",
            "properties": {
                "code": {"type": "string", "description": "Python code to execute"}
            }
        }
    }
]

Where agents break

Infinite loops

The agent gets stuck repeating the same action or oscillating between two approaches. Without explicit loop detection and step limits, agents can burn through tokens indefinitely.

Mitigation: Hard step limits, loop detection (same action + parameters repeated), and explicit “give up gracefully” instructions in the system prompt.

Error cascades

An early mistake (wrong tool call, misinterpreted result) compounds through subsequent steps. The agent builds on faulty assumptions and produces confidently wrong final answers.

Mitigation: Include verification steps in the system prompt (“After gathering information, verify key facts before providing your final answer”).

Tool selection failures

The agent calls the wrong tool, passes wrong parameters, or misinterprets results. “Search for flights” might return HTML that the model cannot parse, or the model might try to call a tool that does not exist.

Mitigation: Clear tool descriptions, well-typed parameters, and few-shot examples of correct tool usage.

Context window exhaustion

Each reasoning step + observation adds to the context. After 10+ steps with verbose tool outputs, the context window fills up. The agent loses access to earlier reasoning.

Mitigation: Summarize observations before appending, limit tool output length, compress earlier steps.

Cost explosion

Each step is an LLM call (potentially with large context). A 10-step agent task with 10K context tokens per step costs 10x more than a single call. At scale, agentic workloads are significantly more expensive than single-shot generation.

Real-world agent implementations

  • ChatGPT (tool use) - ReAct loop with web browsing, code interpreter, and DALL-E tools
  • Claude (computer use) - agent that can interact with desktop applications via screenshots and clicks
  • GitHub Copilot Workspace - decomposes coding tasks into steps: plan → implement → test → iterate
  • Devin - software engineering agent that plans, writes code, runs tests, and debugs autonomously
  • AutoGPT / BabyAGI - early autonomous agent experiments demonstrating the ReAct loop with minimal guardrails

How to apply in practice

Start with a limited tool set. Give the agent 3-5 well-defined tools, not 50. Fewer tools means less confusion about which to use and easier debugging when things go wrong.

Always set step limits. A hard cap (10-20 steps) prevents runaway loops and cost explosions. Include a “wrap up with best effort” instruction when approaching the limit.

Log everything. Record every thought, action, and observation. Agent debugging requires seeing the full reasoning chain. You cannot fix what you cannot trace.

Build for human oversight initially. Before deploying autonomous agents, build “agent with approval” - the agent proposes actions, a human approves, then the action executes. Graduate to autonomy only after measuring reliability.

Test adversarially. Give the agent tasks designed to confuse it: ambiguous instructions, tools that return errors, contradictory information. Your production users will send worse.

FAQ

Q: What is the difference between an agent and a chatbot with tools?

A chatbot with tools makes one tool call per user message and returns the result. An agent makes multiple tool calls autonomously, using the result of each to decide what to do next, without returning to the user between steps. The key difference is the loop: agents iterate toward a goal, chatbots respond to individual messages. In practice, the line is blurry - a “chatbot with tools” that makes 2-3 chained tool calls before responding is agent-like.

Q: Do agents need large models, or can smaller models work?

Agents require strong instruction following, tool use capability, and multi-step reasoning - all of which improve significantly with model size. Models below 70B parameters often struggle with reliable tool selection and multi-step planning. In practice, GPT-4, Claude 3.5+, and Gemini 1.5 Pro are the minimum for reliable agentic behavior. Smaller models (7B-13B) can work for narrow, well-defined agent loops with few tools and simple decisions.

Q: How do I prevent an agent from taking dangerous actions?

Defense in depth: (1) Tool-level permissions - the agent cannot access tools it should not use. (2) Parameter validation - tool inputs are validated before execution. (3) Action confirmation - high-risk actions require human approval. (4) Sandboxing - code execution happens in isolated environments. (5) Audit logging - every action is recorded for review. (6) Rollback capability - actions should be reversible where possible.

Interview questions

Q: Design an agent that can help a developer debug a failing test in a codebase. What tools would you give it and how would the ReAct loop work?

Tools: (1) read_file - read source files, (2) run_tests - execute specific test(s) and return output, (3) search_codebase - find files/functions by name or content, (4) edit_file - make code changes. ReAct loop: Thought: “I need to understand why this test fails.” → Action: run_tests(failing_test) → Observe error message. Thought: “The error is in function X, let me read it.” → Action: read_file(file_with_X) → Observe code. Thought: “I see the bug - it is not handling null input.” → Action: edit_file(fix) → Action: run_tests(failing_test) → Observe: passing. Done. Key considerations: limit file edits to the relevant files, validate that the fix does not break other tests (run full suite), and have a rollback plan if the fix introduces new failures.

Q: An agent in production is completing tasks but taking 15+ steps on average, causing high latency and cost. How would you optimize it?

Diagnose first: analyze step logs to identify patterns. Common causes: (1) The agent gathers information it does not need (overly cautious). Fix: more specific system prompt about what information is sufficient. (2) The agent retries failed actions without changing approach. Fix: add retry limits per tool and explicit “try alternative approach” instructions. (3) Tool outputs are verbose and the agent re-reads them. Fix: summarize tool outputs before appending to context. (4) The task is genuinely complex and needs decomposition. Fix: break into sub-agents that handle parts independently. (5) The model is not capable enough for single-step reasoning. Fix: try a more capable model that can combine steps. Target: most well-designed agents should complete tasks in 3-7 steps.