Technology

AI Agent Masterclass: Long-Term Memory, Swarms, Checkpointing & Human-in-the-Loop

A hands-on Next.js project that builds on advanced AI agent concepts — adding long-term episodic and semantic memory, a swarm architecture, state checkpointing with time travel debugging, and human-in-the-loop controls for high-risk database mutations.

Wayne Cheng

30 min read

Published

Complete Tutorial Code

Follow along with the complete source code for this AI Agent Masterclass. Includes five chapters covering long-term memory, swarm architecture, state checkpointing, and human-in-the-loop controls.

View on GitHub

Introduction
Chapter 1: Multi-Agent System (Recap)
Chapter 2: Long-Term Memory
Chapter 3: Swarm Architecture
Chapter 4: State Checkpointing
Chapter 5: Human-in-the-Loop
Conclusion

Introduction

github.com/audoir/ai-agent-masterclass — README.md

⚠️

Prerequisites

This tutorial is a continuation of the Advanced AI Agent Tutorial, which covers multi-agent systems with MCP, distributed tracing with OpenTelemetry, building a native eval framework, agent topics, and data pipeline theory. Complete that tutorial first before proceeding here.

Once you have the advanced AI agent fundamentals down — multi-agent systems, observability, evals, and agent topics — the next frontier is building systems that are truly production-ready: systems that remember users across sessions, can be paused and resumed mid-pipeline, and require human oversight for high-risk actions.

This masterclass picks up where the Advanced AI Agent Tutorial left off and introduces four new capabilities. You'll add long-term episodic and semantic memory so the Orchestrator learns user preferences over time, explore a swarm architecture where agents hand off to each other directly, implement state checkpointing for time travel debugging, and build a human-in-the-loop pipeline for safe database mutations.

What's Included

Tab	Description
🗄️ View Database	Browse the in-memory SQLite database — inventory, customers, sales, users, sessions, agent topics, and the agent registry
🤖 Orchestrator	An Orchestrator Agent that delegates to 3 specialist agents (Researcher → Writer → Editor) via MCP tool calls, with outputs persisted as named topics in the database
🐝 Swarm Agents	A swarm of autonomous agents (Researcher, Writer, Editor) that hand off control to each other directly — no central orchestrator
🔖 Checkpoints	State checkpointing with time travel debugging — stop a run mid-pipeline, roll back to any step, and rerun with a new prompt
🧑‍💻 HITL	Human-in-the-Loop database mutations — INSERT executes immediately, UPDATE/DELETE require explicit human approval before executing

Getting Started

Requirements

Node.js (v18 or higher)
OpenAI API key
Completed the Advanced AI Agent Tutorial

Installation Steps

1
Clone the repository:
git clone https://github.com/audoir/ai-agent-masterclass.git
2
Install dependencies:
npm install
3
Configure environment:
OPENAI_API_KEY=sk-...
Create a .env.local file with your OpenAI API key
4
Start the dev server:
npm run dev
Open http://localhost:3000 in your browser.

Key Dependencies

Package	Purpose
`ai`	Vercel AI SDK — `generateText`, `streamText`
`@ai-sdk/openai`	OpenAI provider
`@ai-sdk/react`	React hooks — `useCompletion`
`@ai-sdk/mcp`	MCP client for the AI SDK
`mcp-handler`	MCP server handler for Next.js
`@modelcontextprotocol/sdk`	Official MCP TypeScript SDK
`better-sqlite3`	Synchronous SQLite driver
`zod`	Schema validation for tool inputs

Chapter 1: Multi-Agent System (Recap)

github.com/audoir/ai-agent-masterclass — docs/chapter-01-multi-agent-system.md

This is a recap chapter. The multi-agent system described here combines what was Chapter 1 (Orchestrator + SubAgents) and Chapter 4 (Agent Topics) of the Advanced AI Agent Tutorial into a single, unified implementation. If you completed that tutorial, this chapter explains what's already in the codebase and how the pieces fit together.

Architecture

User prompt
    ↓
POST /api/orchestrator/default { prompt, runId, userId }
    ↓
🤖 Orchestrator Agent  (gpt + tools from /api/mcp/agents/orchestrator/mcp)
    │
    ├── write_topic("research_topic_v0", <user's prompt>)
    │
    ├── researcher_agent(readTopics=["research_topic_v0"], writeTopic="research_v0")
    │       └── Researcher Agent queries /api/mcp/database/read/mcp (SQL tools)
    │           → writes research report to chat_sessions.topics JSON
    │           → returns short confirmation
    │
    ├── writer_agent(readTopics=["research_v0"], writeTopic="draft_v0")
    │       └── Writer Agent reads research from DB, writes blog post draft
    │           → returns short confirmation
    │
    └── editor_agent(readTopics=["draft_v0"], writeTopic="final_v0")
            └── Editor Agent reads draft from DB, writes polished final article
                → returns short confirmation
    ↓
Orchestrator streams narration to user

Two Layers of MCP

MCP Server	Route	Exposes
Database MCP	`/api/mcp/database/read/mcp`	`read-inventory`, `read-customers`, `read-sales` SQL tools (SELECT only)
Topic Agent MCP	`/api/mcp/agents/orchestrator/mcp`	`researcher_agent`, `writer_agent`, `editor_agent`

The Agent Topics Pattern

This project uses the agent topics pattern. Instead of passing large content strings between agents through the Orchestrator's context window, each agent reads its input from and writes its output to a named slot stored as JSON in the chat_sessions table.

Without Topics

Orchestrator context grows with each step:

[tool call: researcher_agent("electronics")]

[tool result: "## Research Report\n\n...(2,000 chars)..."]

[tool call: writer_agent("electronics", "## Research Report\n\n...(2,000 chars pasted again)...")]

[tool result: "# The Electronics Revolution\n\n...(3,000 chars)..."]

[tool call: editor_agent("# The Electronics Revolution\n\n...(3,000 chars pasted again)...")]

With Topics

Orchestrator context stays small:

[tool call: write_topic("research_topic_v0", "best-selling electronics")]

[tool result: "Wrote 28 chars to topic research_topic_v0"]

[tool call: researcher_agent(readTopics=["research_topic_v0"], writeTopic="research_v0")]

[tool result: "Done. Read from ["research_topic_v0"], wrote 1842 chars to research_v0."]

[tool call: writer_agent(readTopics=["research_v0"], writeTopic="draft_v0")]

[tool result: "Done. Read from ["research_v0"], wrote 2931 chars to draft_v0."]

Why Topics?

Benefit	How topics provide it
No context bloat	Only the topic name (a short string) is passed between agents. The actual content stays in the DB.
Persistence	Every intermediate output is stored in SQLite. If the pipeline fails, completed stages are preserved.
Resumability	A failed pipeline can be resumed from the last successful topic write — no need to re-run earlier agents.
Inspectability	Any topic can be viewed in the 🔑 User State tab of each agent view.
Versioning	Topic names include a version suffix (_v0, _v1, etc.) — the Orchestrator increments the version for refinements, never overwriting previous versions.
Live progress	The UI polls GET /api/orchestrator/default?runId=... every second to show which topics have been written as the pipeline runs.

Observability

The project includes OpenTelemetry tracing. Every generateText and streamText call emits spans automatically via the AI SDK's experimental_telemetry option. The Orchestrator also propagates the OTel trace context to the Agent MCP server via W3C traceparent headers, so all four agents appear in a single unified trace in Jaeger.

docker run --rm --name jaeger \
  -p 16686:16686 -p 4317:4317 -p 4318:4318 \
  cr.jaegertracing.io/jaegertracing/jaeger:2.18.0

Open http://localhost:16686 and select the ai-agent-masterclass service.

POST /api/orchestrator/defaultAgent TopicsDatabase MCPOpenTelemetryJaeger

Chapter 2: Long-Term Memory

github.com/audoir/ai-agent-masterclass — docs/chapter-02-long-term-memory.md

In Chapter 1 we built an Orchestrator that drives a pipeline of specialist sub-agents. Each run is self-contained: the agents do their work, write topics to the database, and the session ends. The next time the user sends a prompt, the Orchestrator starts fresh — no memory of what the user asked for before, no knowledge of their preferences.

This chapter adds long-term memory: the ability for the system to remember what happened in past sessions and to learn the user's preferences over time, so agents can apply them automatically without the user having to repeat themselves.

Types of AI Agent Memory

Short-Term Memory (In-Context)

The conversation history inside the current context window. The Orchestrator already has this — it sees the full message history for the current session. It is fast and always available, but it disappears when the session ends.

Long-Term Memory (Persistent)

Episodic — A record of what happened in a specific past session (like a diary entry)

Semantic — Stable, generalised facts about the user (like a fact-sheet or profile)

The Memory Pipeline

Orchestrator responds to the user (onFinish fires)
    ↓
Episodic Memory Agent  [runs in background via after()]
    → reads session history from chat_sessions.messages JSON
    → appends a factual summary to chat_sessions.episodic_memories JSON array
    ↓
Semantic Memory Agent  [runs immediately after episodic agent]
    → reads the new episodic summary + previous semantic memory
    → appends an updated user preference fact-sheet to users.semantic_memories JSON array
    ↓
Next session starts
    → Orchestrator reads semantic memory (preferences) + recent episodic memories (context)
    → injects both into its system prompt
    → applies user preferences automatically

Step 1: Triggering Memory After a Session

Memory updates run after the Orchestrator responds to the user, using Next.js's after() function. The user gets their response immediately and the memory agents run in the background without adding latency.

// lib/agents/orchestrator/default.ts

import { after } from "next/server";
import { updateLongTermMemory } from "@/lib/memory";

onFinish: async ({ text }) => {
  await agentMcpClient.close();

  after(() => updateLongTermMemory({ userId, sessionId: runId, finalText: text }));
},

Step 2: The Episodic Memory Agent

The Episodic Memory Agent reads the session's chat history and appends a factual 2–4 sentence summary to chat_sessions.episodic_memories.

⚠️ Key Design Principle

Only record what the user explicitly stated, never infer preferences from the nature of the task. If the user asks for a blog post, that tells you the task — it does not tell you the user prefers blog posts.

✅ Correct episodic memory:

The user asked for a blog post about their best-selling products. The agents ran the standard research → write → edit pipeline. No explicit preferences were stated.

❌ Incorrect (false positives):

The user preferred engaging, brand-friendly content with an upbeat tone. (The user never said this — the agents inferred it from the task type.)

Step 3: The Semantic Memory Agent

The Semantic Memory Agent reads the new episodic summary and the user's previous semantic memory, then appends an updated preference fact-sheet to users.semantic_memories. It also removes stale preferences — if a new episodic memory contradicts a previously stored preference, the old one is updated or deleted.

✅ After user states a preference:

## User Preferences

- The user prefers content under 300 words.

- The user prefers bullet-point format over prose.

✅ When no preference stated:

(No explicit preferences recorded yet.)

Step 4: Injecting Memory into the Orchestrator

At the start of each new session, the Orchestrator reads both memory types and injects them into its system prompt. Semantic memory comes first because it contains the most actionable, durable preferences.

## User Preferences (Semantic Memory)

This is a distilled fact-sheet of what is consistently true about this user,
built up from all their past sessions. You must apply these preferences
automatically — do not ask the user to repeat them.

[Semantic Memory #3 · 2026-06-02 18:30:00]
## User Preferences
- The user prefers content under 300 words.
- The user wants bullet points instead of prose.

## Recent Session History (Episodic Memory)

Summaries of the user's most recent sessions. Use these to understand what
they have been working on and to avoid repeating work unnecessarily.

[Session abc123 · 2026-06-01 14:22:00]
The user asked for a blog post about their best-selling electronics...

The Full Memory Flow (Example)

Session 1: No preference stated

Episodic: "The user asked for a blog post about their best-selling electronics. No explicit preferences were stated."
Semantic: "(No explicit preferences recorded yet.)"

Session 1 continued: User states a preference

User: "Write the blog post in bullet-point format"
Semantic updated: "## User Preferences\n- The user prefers bullet-point format over prose."

Session 2: Preference applied automatically

User: "Write a report on customer trends."
→ Orchestrator reads semantic memory → passes bullet-point preference to Writer Agent automatically

Episodic MemorySemantic Memoryafter()chat_sessions.episodic_memoriesusers.semantic_memories

Chapter 3: Swarm Architecture

github.com/audoir/ai-agent-masterclass — docs/chapter-03-swarm.md

In Chapters 1 and 2 we built an Orchestrator that drives a fixed pipeline of sub-agents and remembers user preferences across sessions. The Orchestrator is a hub-and-spoke model: one central agent holds all the context, decides what to do next, and delegates to specialists one at a time.

This chapter introduces a fundamentally different architecture: the Swarm. Instead of a central boss, you have a team of autonomous specialists that hand off control to each other directly — no middleman required.

Architecture

User prompt
    ↓
POST /api/swarm { prompt, runId, userId }
    ↓
Swarm Loop (app/api/swarm/route.ts)
    │
    ├── Start: researcher (first prompt) or last active agent (follow-up)
    │
    ├── 🔍 Researcher Agent
    │       ├── Queries business database via MCP (inventory, customers, sales)
    │       ├── Writes research findings to chat_sessions.topics (e.g. "research_v0")
    │       └── Calls handoff(writer, summary, instructions, readTopics)
    │
    ├── ✍️ Writer Agent
    │       ├── Reads research from topics via list_topics() / read_topic()
    │       ├── Writes blog post draft to topics (e.g. "draft_v0")
    │       └── Calls handoff(editor, summary, instructions, readTopics)
    │
    └── 📝 Editor Agent
            ├── Reads draft from topics via list_topics() / read_topic()
            ├── Writes polished article to topics (e.g. "final_v0")
            └── Responds with text (no handoff = done)
    ↓
Final response streamed to browser via SSE

The Handoff Tool

The core primitive of the swarm is the handoff tool. Every agent has access to it. When an agent calls handoff(), it saves its output to a named topic and passes control to the next agent with instructions and a list of topic names to read.

// lib/agents/swarm/tools.ts

export function buildHandoffTool({ agentName, onHandoff }) {
  const config = SWARM_AGENT_CONFIG[agentName];

  return tool({
    description: "Hand off control to another agent. Call this when your work is done.",
    inputSchema: z.object({
      agentName: z.enum(config.handoffs),
      summary: z.string().describe("A brief summary of what you did and why you are handing off."),
      instructions: z.string().describe("Clear instructions for the next agent."),
      readTopics: z.array(z.string()).describe("Named topic slots the next agent should read."),
    }),
    execute: async ({ agentName: nextAgent, summary, instructions, readTopics }) => {
      onHandoff({ nextAgent, summary, instructions, readTopics });
      return `Handing off to ${nextAgent}.`;
    },
  });
}

The Swarm Loop

// app/api/swarm/route.ts

// Determine starting agent:
//   - First prompt: start at "researcher"
//   - Follow-up: resume from the last agent that finished (from registry)
const lastFinished = getLastFinishedAgent(runId);
let agentName = lastFinished ? lastFinished : "researcher";

while (hops < MAX_HOPS) {
  hops++;
  const result = await runSwarmAgent({ db, runId, agentName, input });

  // After each agent turn, send the updated messages snapshot via SSE
  send({ type: "messages", messages: getMessages(runId) });

  if (result.type === "done") break;

  agentName = result.nextAgent;
  input = { instructions: result.instructions, readTopics: result.readTopics };
}

Orchestrator vs. Swarm: Analysis

Orchestrator (Hub-and-Spoke)

User → Orchestrator → Researcher
→ Writer
→ Editor

Advantages

✅ Single-purpose agents — easy to design and test
✅ Predictable — Orchestrator controls the sequence
✅ Easy to debug — one agent makes all routing decisions
✅ Centralised context — Orchestrator sees everything

Disadvantages

❌ Context bloat — sub-agent outputs pass through Orchestrator
❌ Single point of failure
❌ No memory of active agent — follow-ups always re-route through manager

Swarm (Peer-to-Peer)

User → Researcher → Writer → Editor → User
↑ ↓
└──────────────────────────────┘

Advantages

✅ No context bloat — each agent only sees its own context
✅ Resilient — no single point of failure
✅ Active agent memory — follow-ups resume from last active agent
✅ Scalable — add new specialists easily

Disadvantages

❌ Agents are more complex — must make routing decisions too
❌ Harder to predict — each agent decides its own next step
❌ Harder to debug — inspect each agent's decision individually
❌ Potential for loops without topology constraints

💡 When to Use Each

Start with the Orchestrator. Don't reach for a swarm unless you have a specific reason to.

Problem	Why swarm helps
Orchestrator's context window is growing too large	Each swarm agent only sees its own context — no central accumulation
Follow-up messages need to resume with the last active specialist	The agent_registry JSON tracks the active agent; follow-ups go directly to them
Many specialists with complex, dynamic routing	Each agent decides its own next step based on what it knows

Example: Multi-Turn Conversation

Turn 1: "Write a blog post about our best-selling electronics"

🔍 researcher → ✍️ writer → 📝 editor
Done. Final article written to topic: final_v0

Turn 2: "Who bought the USB-C Hub?"

📝 editor (last active) → hands off to 🔍 researcher (data question)
Researcher queries database → responds with buyer list

Turn 3: "OK add this info to the blog"

🔍 researcher (last active) → ✍️ writer → 📝 editor
Done. Updated article written to topic: final_v1

Swarm LoopHandoff ToolAgent RegistrySSE StreamingMAX_HOPS

Chapter 4: State Checkpointing

github.com/audoir/ai-agent-masterclass — docs/chapter-04-checkpointing.md

In Chapters 1–3 we built an Orchestrator, added long-term memory, and explored a swarm architecture. Every run was a one-way trip: the agents did their work, wrote topics to the database, and the session ended. If you wanted to change something mid-run, you had to start over from scratch.

This chapter adds state checkpointing: the ability to snapshot the conversation state before every step, roll back to any snapshot, and re-run the pipeline from that point — optionally with a new prompt. This is sometimes called time travel debugging for AI agents.

Why Checkpointing?

Long-running agent pipelines are expensive. A full Researcher → Writer → Editor run can take 30–60 seconds and cost several cents in API calls. If the Writer produces a draft you don't like, you shouldn't have to re-run the Researcher. You should be able to roll back to just before the Writer ran and give it different instructions.

The key insight is that the entire state of an agent pipeline is just the messages JSON array and the topics JSON object stored in chat_sessions. If you can snapshot both before each step and restore them on demand, you get full time travel for free.

Architecture

User prompt
    ↓
POST /api/orchestrator/checkpoints/start { prompt, runId, userId }
    ↓
runOrchestratorAgent (lib/agents/orchestrator/checkpoints.ts)
    │
    ├── checkpointBeforeMessage()  ← snapshot before user message
    ├── initChatSession()          ← write user message to DB
    │
    └── runOrchestratorCore()
            │
            ├── streamText({ abortSignal: req.signal, ... })
            │
            ├── experimental_onToolCallStart:
            │       └── checkpointBeforeMessage()  ← snapshot before tool call
            │
            ├── onStepFinish (tool call step):
            │       ├── saveAssistantMessage()
            │       ├── saveToolCallMessage()
            │       └── saveToolMessage()
            │
            ├── onAbort:
            │       └── agentMcpClient.close()     ← clean up on stop
            │
            └── onFinish:
                    ├── checkpointBeforeMessage()  ← snapshot before assistant reply
                    └── saveAssistantMessage()

Schema

No new tables are added. Checkpoints are stored as a JSON array in a new checkpoints column on the existing chat_sessions table:

interface StoredCheckpoint {
  message_id: string;          // short UUID (first 8 chars) — also the id of the next message
  messages_snapshot: StoredMessage[];          // full copy of messages at this point
  topics_snapshot: Record<string, unknown>;   // full copy of topics at this point
  created_at: string;
}

Step 1: Saving Checkpoints

Checkpoints are saved at three points in the pipeline. The checkpoint id and the id of the next message written are always the same value — checkpointBeforeMessage generates the UUID, saves the checkpoint, and returns the UUID so the caller can pass it to the message writer.

Checkpoint[0] = state before user message → "undo the whole prompt"

Checkpoint[1] = state before tool call step 1 → "rerun from step 1"

Checkpoint[2] = state before tool call step 2 → "rerun from step 2"

Checkpoint[3] = state before assistant reply → "regenerate the reply"

Step 2: Why `experimental_onToolCallStart`?

The AI SDK fires experimental_onToolCallStart before the tool executes — which is exactly when we need to snapshot the state. onStepFinish fires after the tool has already run and returned its result, so it is too late to checkpoint the pre-tool state there.

Step 3: Restoring a Checkpoint

// lib/chat-session.ts

export function restoreCheckpoint(db, sessionId, messageId) {
  const checkpoint = checkpoints.find((cp) => cp.message_id === messageId);
  if (!checkpoint) return null;

  // Restore messages AND topics from the snapshot.
  db.prepare(
    "UPDATE chat_sessions SET messages = ?, topics = ?, updated_at = datetime('now') WHERE id = ?"
  ).run(
    JSON.stringify(checkpoint.messages_snapshot),
    JSON.stringify(checkpoint.topics_snapshot),
    sessionId,
  );

  return storedMessagesToModelMessages(checkpoint.messages_snapshot);
}

⚠️ Topics Are Also Snapshotted

Each checkpoint captures both messages_snapshot and topics_snapshot. This is necessary because topics are written by sub-agents during the pipeline — rolling back only the messages without rolling back the topics would leave the session in an inconsistent state (e.g. a draft_v0 topic written by the Writer would still exist after rolling back to before the Writer ran).

Step 4: Aborting a Run

The AI SDK's abortSignal parameter lets the client cancel a stream mid-run. When the user clicks the Stop button, useCompletion's stop() function cancels the HTTP request. The DB state stays consistent because onStepFinish only fires for fully completed steps. Any step that gets aborted mid-stream simply doesn't get written to the DB — so the last checkpoint is always valid.

Example: Stop and Rerun with a New Prompt

Step 1 — Start a run:

Send: "Write a blog post about our best-selling electronics"
Watch the Messages tab fill in as the Orchestrator calls Researcher, Writer, Editor.

Step 2 — Stop during the Writer:

Click the red ■ Stop button while the Writer agent is running.
The stream stops immediately. DB state is consistent — only fully completed steps are written.

Step 3 — Rerun from the Writer with a new prompt:

Click 🔖 Rerun from here on the writer_agent tool call message.
Type: "Make sure the blog is in bullet point format"
Click ▶ Run.

Result: Researcher's work is preserved. Only Writer and Editor re-run with the new instruction.

Further Exploration: Branching

The current implementation supports linear time travel: roll back to any checkpoint and re-run from that point. But because checkpoints are never deleted, the data model already supports branching — like a version control system for your agent runs. Checkpoints from the original run remain intact even after a restore, so you can always navigate back to any branch. No schema changes needed.

checkpointBeforeMessagerestoreCheckpointexperimental_onToolCallStartabortSignalTime Travel Debugging

Chapter 5: Human-in-the-Loop (HITL)

github.com/audoir/ai-agent-masterclass — docs/chapter-05-hitl.md

In Chapters 1–4 we built an Orchestrator, added long-term memory, explored a swarm architecture, and added state checkpointing. Every pipeline ran autonomously from start to finish — the agent decided what to do, called the tools, and reported back.

This chapter adds Human-in-the-Loop (HITL): the ability for an agent to deliberately pause mid-pipeline and wait for a human to either approve a high-risk action or provide missing information before continuing.

Why HITL?

1. Authorization (The Gatekeeper)

The agent halts before taking a high-risk action and waits for a human to click "Approve" or "Reject".

• Deleting records from a database
• Updating prices or customer data
• Sending an email to a client
• Executing a shell command

2. Steering (The Co-Pilot)

The agent halts because it lacks context and needs the human to clarify before it can proceed correctly.

• "I found three users named John Smith. Which one?"
• "The product doesn't exist. Did you mean X?"
• "This will delete 47 records. Are you sure?"

Architecture: The Two-Turn Approval Flow

[UPDATE/DELETE path]

Turn 0 — first pass:
  write_topic("database-mutation_v0", <user's request>)
  write_topic("user-approval_v0", "false")
  database_mutator_agent(readTopics=["database-mutation_v0", "user-approval_v0"],
                         writeTopic="mutation-result_v0")
      └── Agent finds records, writes STATUS: fail + description of what will change
  read_topic("mutation-result_v0")
  request_human_approval(action_summary, question_for_human)
      └── ⚠️ LOOP BREAKS HERE — streamText returns to the browser
          The user sees the question and types a reply

Turn 1 — user replies "yes" or "no":
  write_topic("database-mutation_v1", <original request>)
  write_topic("user-approval_v1", "true" or "false")
  database_mutator_agent(readTopics=["database-mutation_v1", "user-approval_v1"],
                         writeTopic="mutation-result_v1")
      └── user-approval_v1 = "true"  → executes, writes STATUS: success
          user-approval_v1 = "false" → writes STATUS: fail (cancelled)
  Orchestrator narrates final outcome

Step 1: The HITL Tool

The key insight is that HITL is implemented as a tool that intentionally breaks the execution loop. The request_human_approval tool executes, returns a JSON payload describing the pending action, and then the Orchestrator — following its system prompt instructions — stops streaming and presents the question to the user.

// lib/agents/orchestrator/mutator-tools.ts

export const mutatorTools = {
  request_human_approval: tool({
    description:
      "Stop execution and present a confirmation question to the user for a destructive " +
      "UPDATE or DELETE operation. After calling this tool, stop and wait for the user's reply.",
    inputSchema: z.object({
      action_summary: z.string().describe("A clear description of what records will be modified."),
      question_for_human: z.string().describe("The confirmation question to present to the user."),
    }),
    execute: async ({ action_summary, question_for_human }) => {
      return JSON.stringify({
        status: "awaiting_human_approval",
        action_summary,
        question_for_human,
        instructions: "Present the question_for_human to the user and stop.",
      });
    },
  }),
};

Step 2: The Database Mutator Agent

The database_mutator_agent is the specialist that actually reads and writes the database. Its system prompt encodes the approval logic:

INSERT: Verify required fields exist → execute immediately → write STATUS: success

UPDATE/DELETE (first pass, approval = "false"): Read records → describe what will change → write STATUS: fail → do NOT execute

UPDATE/DELETE (second pass, approval = "true"): Read records again → execute mutation → write STATUS: success

Not found: Write STATUS: fail immediately — do NOT ask for confirmation

The Stateless Advantage

Other frameworks (LangGraph, AutoGen, CrewAI) implement HITL by pausing a running process and waiting for a signal to resume. This requires a persistent state store, a background worker, and a mechanism to wake the process back up.

The Vercel AI SDK approach is different: the "pause" is just the HTTP stream ending, and the "resume" is the next HTTP request. There is no running process to keep alive, no state to serialize, no worker to wake up. The entire conversation state lives in the database as a JSON array of messages.

✅ Benefits of Stateless HITL

• Scales horizontally — any server instance can handle any turn
• Survives restarts — conversation resumes exactly where it left off
• No timeouts — the "pause" can last indefinitely (hours, days)
• Easy to inspect — every turn is a normal HTTP request you can replay

Example: The Four Suggestion Prompts

1. "Add a new product: Bluetooth Speaker, category Electronics, price $79.99"

Steering pattern — INSERT, but supplier field is missing

Agent: "I can't complete this INSERT yet — the supplier field is required but was not provided. What supplier should I use?"
User: "AudioWorld"
Agent: INSERT succeeds → STATUS: success

2. "Update the price of the USB-C Hub to $27.99"

Authorization pattern — UPDATE requires approval

Agent: "Found USB-C Hub (id=2), currently $34.99. Do you want to proceed with updating to $27.99?"
User: "yes"
Agent: UPDATE executes → STATUS: success

3. "Delete all sales records older than 2026-02-01"

Authorization pattern — bulk DELETE requires approval

Agent: "Found 10 sales records. Do you want to permanently delete these 10 records?"
User: "no"
Agent: Cancelled → STATUS: fail (cancelled)

4. "Delete the product 'Gaming Chair' from inventory"

Not found edge case — no confirmation needed

Agent: "No product named 'Gaming Chair' was found. Available furniture: Standing Desk Mat, Office Chair, Desk Lamp. Did you mean one of these?"

request_human_approvaldatabase_mutator_agentTwo-Turn FlowStateless HITLuseCompletion

Conclusion

Across five chapters we built a complete AI agent system from the ground up — an Orchestrator driving specialist sub-agents, long-term memory, a swarm architecture, state checkpointing, and human-in-the-loop controls. Each chapter adds a capability that makes the system more production-ready.

Key Lessons

Lesson	Takeaway
Long-term memory	Episodic + semantic memory turns a stateless tool into a learning system. Only record explicitly stated preferences — never infer.
Swarm vs. Orchestrator	Single-purpose agents connected to an Orchestrator are simpler, more deterministic, and easier to maintain. Scale with hierarchies of Orchestrators, not peer-to-peer swarms.
State checkpointing	Snapshot messages + topics before every step. Enables mid-run stops, rollbacks, and instruction injection without re-running the full pipeline.
HITL	Implement as a tool, not middleware. Covers both authorization (gatekeeper for destructive actions) and steering (co-pilot for missing information).
Stateless architecture	All state in the database. No running processes between turns. Scales horizontally, survives restarts, and supports indefinite pauses.
How to build	Start with simple, single-purpose agents. Connect to an Orchestrator. Scale with hierarchies. Add memory, checkpointing, and HITL only when you have a concrete reason to.

Learning Outcomes

By working through this masterclass, you will have gained practical experience with:

• Adding episodic and semantic long-term memory to an Orchestrator agent
• Building a swarm architecture where agents hand off control to each other directly
• Understanding when to use an Orchestrator vs. a swarm (and why to prefer the Orchestrator)
• Implementing state checkpointing with full time travel debugging for agent pipelines
• Building human-in-the-loop controls as a tool — not middleware — for both authorization and steering
• Designing stateless, horizontally scalable agent pipelines that survive restarts and support indefinite pauses

About the Author

Wayne Cheng is the founder and AI app developer at Audoir, LLC. Prior to founding Audoir, he worked as a hardware design engineer for Silicon Valley startups and an audio engineer for creative organizations. He holds an MSEE from UC Davis and a Music Technology degree from Foothill College.

Further Exploration

Explore the complete masterclass repository and experiment with extending the examples. Consider adding new specialist agents, implementing branching checkpoints with a visual tree UI, or extending HITL with structured approval buttons instead of plain text replies.

New to advanced AI agents? Start with the Advanced AI Agent Tutorial first, which covers multi-agent systems, OpenTelemetry observability, evals, and data pipeline theory.

For more AI-powered development tools and tutorials, visit Audoir .

AI Agent Masterclass: Long-Term Memory, Swarms, Checkpointing & Human-in-the-Loop

Complete Tutorial Code

Table of Contents

Introduction

Prerequisites

What's Included

Getting Started

Requirements

Installation Steps

Key Dependencies

Chapter 1: Multi-Agent System (Recap)

Architecture

Two Layers of MCP

The Agent Topics Pattern

Without Topics

With Topics

Why Topics?

Observability

Chapter 2: Long-Term Memory

Types of AI Agent Memory

Short-Term Memory (In-Context)

Long-Term Memory (Persistent)

The Memory Pipeline

Step 1: Triggering Memory After a Session

Step 2: The Episodic Memory Agent

⚠️ Key Design Principle

Step 3: The Semantic Memory Agent

Step 4: Injecting Memory into the Orchestrator

The Full Memory Flow (Example)

Session 1: No preference stated

Session 1 continued: User states a preference

Session 2: Preference applied automatically

Chapter 3: Swarm Architecture

Architecture

The Handoff Tool

The Swarm Loop

Orchestrator vs. Swarm: Analysis

Orchestrator (Hub-and-Spoke)

Advantages

Disadvantages

Swarm (Peer-to-Peer)

Advantages

Disadvantages

💡 When to Use Each

Example: Multi-Turn Conversation

Turn 1: "Write a blog post about our best-selling electronics"

Turn 2: "Who bought the USB-C Hub?"

Turn 3: "OK add this info to the blog"

Chapter 4: State Checkpointing

Why Checkpointing?

Architecture

Schema

Step 1: Saving Checkpoints

Step 2: Why experimental_onToolCallStart?

Step 3: Restoring a Checkpoint

⚠️ Topics Are Also Snapshotted

Step 4: Aborting a Run

Example: Stop and Rerun with a New Prompt

Step 1 — Start a run:

Step 2 — Stop during the Writer:

Step 3 — Rerun from the Writer with a new prompt:

Further Exploration: Branching

Chapter 5: Human-in-the-Loop (HITL)

Why HITL?

1. Authorization (The Gatekeeper)

2. Steering (The Co-Pilot)

Architecture: The Two-Turn Approval Flow

Step 1: The HITL Tool

Step 2: The Database Mutator Agent

The Stateless Advantage

✅ Benefits of Stateless HITL

Example: The Four Suggestion Prompts

1. "Add a new product: Bluetooth Speaker, category Electronics, price $79.99"

2. "Update the price of the USB-C Hub to $27.99"

3. "Delete all sales records older than 2026-02-01"

4. "Delete the product 'Gaming Chair' from inventory"

Conclusion

Key Lessons

Learning Outcomes

About the Author

Step 2: Why `experimental_onToolCallStart`?