Agentic Loops & Termination: Knowing When to Stop


You deploy an AI agent that researches competitors and generates reports. The first run works perfectly - 6 steps, clean report. The second run gathers information for 23 steps, spending $4.50 in tokens, because it keeps finding “one more thing to check.” The third run hits your 30-step limit and returns a half-finished report with an abrupt cutoff. The fourth run completes in 2 steps with a superficial report because the model decided it was “done” after one search.

Four runs, four different behaviors - and the only variable is the model’s judgment about when it has enough information. This is the termination problem: agents need to know when to stop, and getting this right is harder than getting the loop itself to work.

Production agents fail in two directions: they run too long (wasting resources, degrading quality from context bloat) or stop too early (incomplete results, missed steps). Reliable termination requires explicit design, not just “the model will know when it is done.”

The anatomy of an agent loop

Every agent loop has four components:

  1. Entry condition - What triggers the loop to start
  2. Step logic - What happens in each iteration (reason → act → observe)
  3. Continue condition - What determines if the loop should run again
  4. Exit conditions - What stops the loop (success or failure)
graph TD
  ENTRY["Entry: User task received"] --> STEP["Step: Reason → Act → Observe"]
  STEP --> CHECK{"Exit condition met?"}
  CHECK -->|"Success: task complete"| SUCCESS["Return result"]
  CHECK -->|"Failure: max steps"| FAIL_STEPS["Return best effort + explanation"]
  CHECK -->|"Failure: error"| FAIL_ERR["Return error + partial progress"]
  CHECK -->|"Continue"| GUARD{"Guard checks pass?"}
  GUARD -->|"Yes"| STEP
  GUARD -->|"No: budget exceeded"| FAIL_BUDGET["Return partial + cost warning"]

  style ENTRY fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style SUCCESS fill:#E1F5EE,stroke:#0F6E56,color:#085041
  style FAIL_STEPS fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
  style FAIL_ERR fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
  style FAIL_BUDGET fill:#FAEEDA,stroke:#854F0B,color:#633806

Termination strategies

Strategy 1: Model-decided termination

The model decides when it is done by generating a specific stop action:

tools = [
    {"name": "search", ...},
    {"name": "calculate", ...},
    {"name": "submit_answer", 
     "description": "Call this when you have gathered enough information to provide a complete answer."}
]

Pros: The model can judge task completion naturally. Cons: Unreliable - models sometimes stop too early (lazy) or never stop (perfectionist). Quality varies with temperature.

Strategy 2: Structured completion criteria

Define explicit criteria in the system prompt that must be met before stopping:

You are done when:
1. You have found the answer to the user's question with supporting evidence
2. You have verified the answer from at least 2 sources
3. You have checked for any contradictions

Do NOT stop if:
- You only found one source
- The information seems outdated (check dates)
- The user asked for multiple items and you only found some

Pros: More predictable behavior, consistent quality bar. Cons: Rigid - some tasks need flexibility. Hard to define criteria for open-ended tasks.

Strategy 3: Hard limits with graceful degradation

Set absolute caps and design the system to produce useful output even at the limit:

MAX_STEPS = 15
STEP_BUDGET = {
    "research": 8,      # Max 8 steps for gathering info
    "synthesis": 3,     # Max 3 steps for combining info
    "verification": 2,  # Max 2 steps for fact-checking
    "formatting": 2     # Max 2 steps for output formatting
}

def agent_with_budget(task):
    phase = "research"
    phase_steps = 0
    
    for total_steps in range(MAX_STEPS):
        if phase_steps >= STEP_BUDGET[phase]:
            phase = next_phase(phase)
            phase_steps = 0
            inject_message(f"Moving to {phase} phase. Summarize findings so far.")
        
        # Normal step execution
        result = execute_step(...)
        phase_steps += 1
        
        if is_complete(result):
            return result
    
    # Hit limit - return best effort
    return synthesize_partial_results()

Strategy 4: Token/cost budget

Cap the total token expenditure rather than step count:

MAX_TOKEN_BUDGET = 50000  # Total tokens across all steps
tokens_used = 0

while tokens_used < MAX_TOKEN_BUDGET:
    response = llm.generate(messages, tools=tools)
    tokens_used += response.usage.total_tokens
    
    if tokens_used > MAX_TOKEN_BUDGET * 0.8:
        inject_message("You are approaching your budget. Wrap up with available information.")
    
    # ... process response

Strategy 5: Quality-gated termination

Use a separate evaluator to judge whether the output is ready:

def quality_gate(task, current_output):
    eval_prompt = f"""
    Task: {task}
    Current output: {current_output}
    
    Is this output complete and correct? Score 1-10.
    Score >= 7: ready to return
    Score < 7: needs more work (explain what's missing)
    """
    score, feedback = evaluator.assess(eval_prompt)
    return score >= 7, feedback
graph LR
  subgraph strategies["Termination Strategies"]
      S1["Model-decided
Flexible but unreliable"]
      S2["Criteria-based
Predictable but rigid"]
      S3["Step limits
Simple, safe"]
      S4["Token budget
Cost-controlled"]
      S5["Quality-gated
Best output, expensive"]
  end

  style S1 fill:#FAEEDA,stroke:#854F0B,color:#633806
  style S2 fill:#E1F5EE,stroke:#0F6E56,color:#085041
  style S3 fill:#F1EFE8,stroke:#888780,color:#444441
  style S4 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style S5 fill:#E1F5EE,stroke:#0F6E56,color:#085041

Common loop failure patterns

The infinite researcher

The agent keeps finding “just one more source” to check. It is thorough to a fault, accumulating information without synthesizing it.

Fix: Explicit phase transitions. After N research steps, force transition to synthesis: “You have gathered enough information. Synthesize your findings now.”

The premature optimizer

The agent generates an answer after one tool call without verifying or expanding. It satisfies the literal task (“find information”) without meeting the spirit (a comprehensive answer).

Fix: Minimum step requirements for complex tasks. Structured completion criteria that require verification.

The error loop

A tool call fails, the agent retries the same call identically, fails again, retries again - consuming steps without making progress.

Fix: Track consecutive failures. After 2 failures with the same tool, force a different approach:

consecutive_failures = 0
last_failed_tool = None

def on_tool_error(tool_name, error):
    if tool_name == last_failed_tool:
        consecutive_failures += 1
    else:
        consecutive_failures = 1
    
    if consecutive_failures >= 2:
        inject_message(f"The {tool_name} tool has failed twice. Try a different approach or tool.")
        consecutive_failures = 0

The context bloat spiral

Each step adds observations to context. After 10+ steps, the context is so large that the model struggles to attend to relevant information and makes worse decisions - causing more steps.

Fix: Summarize observations periodically. Replace verbose tool outputs with concise summaries every 5 steps:

if step_count % 5 == 0:
    summary = summarize_recent_observations(messages[-10:])
    messages = [system_prompt, summary] + messages[-4:]  # Keep recent + summary

Designing robust agent loops

The “plan then execute” pattern

Instead of interleaving planning and execution, separate them:

def plan_then_execute(task):
    # Phase 1: Create a plan (one LLM call)
    plan = llm.generate(f"Create a step-by-step plan for: {task}")
    
    # Phase 2: Execute each step (bounded by plan length)
    results = []
    for step in plan.steps:
        result = execute_step(step)
        results.append(result)
    
    # Phase 3: Synthesize results (one LLM call)
    return llm.generate(f"Synthesize these results: {results}")

Advantage: Predictable step count, clear progress, easier to resume from failures.

The “checkpoint and resume” pattern

Save state at each step so the agent can resume from the last successful step after a failure:

def resumable_agent(task, checkpoint_store):
    # Load existing progress
    state = checkpoint_store.load(task.id) or initial_state(task)
    
    while not state.is_complete:
        step_result = execute_step(state)
        state.update(step_result)
        checkpoint_store.save(task.id, state)  # Persist after each step
    
    return state.final_output

The “escalation ladder” pattern

If the agent cannot complete the task within budget, escalate rather than fail silently:

ESCALATION_LEVELS = [
    {"steps": 5, "action": "try harder"},
    {"steps": 10, "action": "simplify approach"},
    {"steps": 15, "action": "return partial + flag for human review"},
]

Real-world loop implementations

  • LangChain AgentExecutor - configurable max_iterations, early stopping on repeated actions
  • AutoGen - multi-agent loops with conversation-based termination (agents agree they are done)
  • CrewAI - task-based termination with explicit deliverables per agent
  • OpenAI Assistants - run-level limits with status polling and cancellation
  • Anthropic Claude (tool use) - the model uses a stop_sequence when done, with application-level step limits

How to apply in practice

Combine multiple termination strategies. Use model-decided + hard limits + token budget. The model tries to finish naturally, hard limits prevent runaway, and token budget prevents cost explosions.

Tune limits per task type. A “lookup a fact” task should complete in 2-3 steps. A “research report” task might need 10-15. Classify the task complexity before setting limits.

Always return something useful. Even when hitting a limit, return partial results with a clear explanation of what was completed and what was not. “I found information about X and Y but could not verify Z due to reaching my step limit” is more useful than an error.

Monitor step count distributions. Track how many steps tasks take in production. If the median is 4 but the p95 is 22, you have tasks that occasionally spiral. Investigate the p95 cases to find patterns.

Implement dead-man switches. If the agent has not made progress in 3 consecutive steps (same observations, no new information), terminate and escalate.

FAQ

Q: How do I set the right max steps for my agent?

Start conservative (10 steps), monitor production distributions, and increase only where needed. Analyze completed tasks: if 95% finish in 6 steps, a limit of 10 provides reasonable headroom. If tasks frequently hit the limit with incomplete results, either the limit is too low or the agent needs better planning to be more efficient. Track the correlation between step count and output quality - past a certain point, more steps degrade quality rather than improve it.

Q: Should the agent always use all available steps, or is finishing early better?

Finishing early is almost always better. More steps mean more cost, more latency, and more accumulated context (which can degrade reasoning). An agent that completes a task in 3 focused steps typically produces better output than one that takes 12 steps gathering marginally relevant information. Optimize for efficiency, not thoroughness beyond the quality bar.

Q: How do I handle the case where the agent needs human input mid-loop?

Pause the loop and return to the user with a specific question. This is not a failure - it is a feature. Design the loop to support “pause and resume”: save the current state, return a clarifying question to the user, and resume from the saved state when the user responds. The alternative (the agent guessing) leads to wasted steps on the wrong path.

Interview questions

Q: Design the loop control for an AI agent that generates a research report by searching the web, finding relevant papers, and synthesizing findings. How do you ensure consistent quality without excessive resource usage?

Three-phase loop: (1) Discovery phase (max 6 steps): search for sources, evaluate relevance, collect links. Termination: stop when 5+ relevant sources found OR 6 steps reached. (2) Deep-read phase (max 4 steps): read top 3-4 sources, extract key findings. Termination: stop when key claims extracted from each source. (3) Synthesis phase (max 2 steps): combine findings, identify agreements/contradictions, generate report. Quality gate: after synthesis, run evaluator. If score < 7, allow 2 more refinement steps. If still low, return with caveat. Total budget: 12 steps normal, 14 max. Cost cap: $2 per report. Dead-man switch: if consecutive steps add <100 new tokens of useful information, force phase transition.

Q: Your agent completes tasks correctly 85% of the time but the other 15% either loop excessively or return incomplete results. How do you diagnose and improve termination reliability?

Diagnosis: Categorize the 15% failures: (A) looping too long (what step patterns repeat?), (B) stopping too early (what criteria were not met?), (C) hitting limits with partial results (were limits too low or was the task too complex?). For (A): analyze the repeated steps - likely the agent is retrying failed actions. Add consecutive failure detection and alternative-approach forcing. For (B): add structured completion criteria and minimum step requirements for complex tasks. For (C): implement task complexity classification at entry and set dynamic limits per complexity class. Cross-cutting fix: add a “progress check” every 3 steps where the agent explicitly states what it has accomplished and what remains. If “what remains” is empty, terminate. If it is the same as 3 steps ago, the agent is stuck.