AI Agent Masterclass: Long-Term Memory, Swarms, Checkpointing & Human-in-the-Loop
A hands-on Next.js project that builds on advanced AI agent concepts — adding long-term episodic and semantic memory, a swarm architecture, state checkpointing with time travel debugging, and human-in-the-loop controls for high-risk database mutations.
Complete Tutorial Code
Follow along with the complete source code for this AI Agent Masterclass. Includes five chapters covering long-term memory, swarm architecture, state checkpointing, and human-in-the-loop controls.
View on GitHubTable of Contents
Introduction
Prerequisites
This tutorial is a continuation of the Advanced AI Agent Tutorial, which covers multi-agent systems with MCP, distributed tracing with OpenTelemetry, building a native eval framework, agent topics, and data pipeline theory. Complete that tutorial first before proceeding here.
Once you have the advanced AI agent fundamentals down — multi-agent systems, observability, evals, and agent topics — the next frontier is building systems that are truly production-ready: systems that remember users across sessions, can be paused and resumed mid-pipeline, and require human oversight for high-risk actions.
This masterclass picks up where the Advanced AI Agent Tutorial left off and introduces four new capabilities. You'll add long-term episodic and semantic memory so the Orchestrator learns user preferences over time, explore a swarm architecture where agents hand off to each other directly, implement state checkpointing for time travel debugging, and build a human-in-the-loop pipeline for safe database mutations.
What's Included
| Tab | Description |
|---|---|
| 🗄️ View Database | Browse the in-memory SQLite database — inventory, customers, sales, users, sessions, agent topics, and the agent registry |
| 🤖 Orchestrator | An Orchestrator Agent that delegates to 3 specialist agents (Researcher → Writer → Editor) via MCP tool calls, with outputs persisted as named topics in the database |
| 🐝 Swarm Agents | A swarm of autonomous agents (Researcher, Writer, Editor) that hand off control to each other directly — no central orchestrator |
| 🔖 Checkpoints | State checkpointing with time travel debugging — stop a run mid-pipeline, roll back to any step, and rerun with a new prompt |
| 🧑💻 HITL | Human-in-the-Loop database mutations — INSERT executes immediately, UPDATE/DELETE require explicit human approval before executing |
Getting Started
Requirements
- Node.js (v18 or higher)
- OpenAI API key
- Completed the Advanced AI Agent Tutorial
Installation Steps
- 1Clone the repository:
git clone https://github.com/audoir/ai-agent-masterclass.git - 2Install dependencies:
npm install - 3Configure environment:
OPENAI_API_KEY=sk-...Create a
.env.localfile with your OpenAI API key - 4Start the dev server:
npm run devOpen http://localhost:3000 in your browser.
Key Dependencies
| Package | Purpose |
|---|---|
ai | Vercel AI SDK — generateText, streamText |
@ai-sdk/openai | OpenAI provider |
@ai-sdk/react | React hooks — useCompletion |
@ai-sdk/mcp | MCP client for the AI SDK |
mcp-handler | MCP server handler for Next.js |
@modelcontextprotocol/sdk | Official MCP TypeScript SDK |
better-sqlite3 | Synchronous SQLite driver |
zod | Schema validation for tool inputs |
Chapter 1: Multi-Agent System (Recap)
This is a recap chapter. The multi-agent system described here combines what was Chapter 1 (Orchestrator + SubAgents) and Chapter 4 (Agent Topics) of the Advanced AI Agent Tutorial into a single, unified implementation. If you completed that tutorial, this chapter explains what's already in the codebase and how the pieces fit together.
Architecture
User prompt
↓
POST /api/orchestrator/default { prompt, runId, userId }
↓
🤖 Orchestrator Agent (gpt + tools from /api/mcp/agents/orchestrator/mcp)
│
├── write_topic("research_topic_v0", <user's prompt>)
│
├── researcher_agent(readTopics=["research_topic_v0"], writeTopic="research_v0")
│ └── Researcher Agent queries /api/mcp/database/read/mcp (SQL tools)
│ → writes research report to chat_sessions.topics JSON
│ → returns short confirmation
│
├── writer_agent(readTopics=["research_v0"], writeTopic="draft_v0")
│ └── Writer Agent reads research from DB, writes blog post draft
│ → returns short confirmation
│
└── editor_agent(readTopics=["draft_v0"], writeTopic="final_v0")
└── Editor Agent reads draft from DB, writes polished final article
→ returns short confirmation
↓
Orchestrator streams narration to userTwo Layers of MCP
| MCP Server | Route | Exposes |
|---|---|---|
| Database MCP | /api/mcp/database/read/mcp | read-inventory, read-customers, read-sales SQL tools (SELECT only) |
| Topic Agent MCP | /api/mcp/agents/orchestrator/mcp | researcher_agent, writer_agent, editor_agent |
The Agent Topics Pattern
This project uses the agent topics pattern. Instead of passing large content strings between agents through the Orchestrator's context window, each agent reads its input from and writes its output to a named slot stored as JSON in the chat_sessions table.
Without Topics
Orchestrator context grows with each step:
With Topics
Orchestrator context stays small:
Why Topics?
| Benefit | How topics provide it |
|---|---|
| No context bloat | Only the topic name (a short string) is passed between agents. The actual content stays in the DB. |
| Persistence | Every intermediate output is stored in SQLite. If the pipeline fails, completed stages are preserved. |
| Resumability | A failed pipeline can be resumed from the last successful topic write — no need to re-run earlier agents. |
| Inspectability | Any topic can be viewed in the 🔑 User State tab of each agent view. |
| Versioning | Topic names include a version suffix (_v0, _v1, etc.) — the Orchestrator increments the version for refinements, never overwriting previous versions. |
| Live progress | The UI polls GET /api/orchestrator/default?runId=... every second to show which topics have been written as the pipeline runs. |
Observability
The project includes OpenTelemetry tracing. Every generateText and streamText call emits spans automatically via the AI SDK's experimental_telemetry option. The Orchestrator also propagates the OTel trace context to the Agent MCP server via W3C traceparent headers, so all four agents appear in a single unified trace in Jaeger.
docker run --rm --name jaeger \
-p 16686:16686 -p 4317:4317 -p 4318:4318 \
cr.jaegertracing.io/jaegertracing/jaeger:2.18.0Open http://localhost:16686 and select the ai-agent-masterclass service.
Chapter 2: Long-Term Memory
In Chapter 1 we built an Orchestrator that drives a pipeline of specialist sub-agents. Each run is self-contained: the agents do their work, write topics to the database, and the session ends. The next time the user sends a prompt, the Orchestrator starts fresh — no memory of what the user asked for before, no knowledge of their preferences.
This chapter adds long-term memory: the ability for the system to remember what happened in past sessions and to learn the user's preferences over time, so agents can apply them automatically without the user having to repeat themselves.
Types of AI Agent Memory
Short-Term Memory (In-Context)
The conversation history inside the current context window. The Orchestrator already has this — it sees the full message history for the current session. It is fast and always available, but it disappears when the session ends.
Long-Term Memory (Persistent)
The Memory Pipeline
Orchestrator responds to the user (onFinish fires)
↓
Episodic Memory Agent [runs in background via after()]
→ reads session history from chat_sessions.messages JSON
→ appends a factual summary to chat_sessions.episodic_memories JSON array
↓
Semantic Memory Agent [runs immediately after episodic agent]
→ reads the new episodic summary + previous semantic memory
→ appends an updated user preference fact-sheet to users.semantic_memories JSON array
↓
Next session starts
→ Orchestrator reads semantic memory (preferences) + recent episodic memories (context)
→ injects both into its system prompt
→ applies user preferences automaticallyStep 1: Triggering Memory After a Session
Memory updates run after the Orchestrator responds to the user, using Next.js's after() function. The user gets their response immediately and the memory agents run in the background without adding latency.
// lib/agents/orchestrator/default.ts
import { after } from "next/server";
import { updateLongTermMemory } from "@/lib/memory";
onFinish: async ({ text }) => {
await agentMcpClient.close();
after(() => updateLongTermMemory({ userId, sessionId: runId, finalText: text }));
},Step 2: The Episodic Memory Agent
The Episodic Memory Agent reads the session's chat history and appends a factual 2–4 sentence summary to chat_sessions.episodic_memories.
⚠️ Key Design Principle
Only record what the user explicitly stated, never infer preferences from the nature of the task. If the user asks for a blog post, that tells you the task — it does not tell you the user prefers blog posts.
✅ Correct episodic memory:
❌ Incorrect (false positives):
Step 3: The Semantic Memory Agent
The Semantic Memory Agent reads the new episodic summary and the user's previous semantic memory, then appends an updated preference fact-sheet to users.semantic_memories. It also removes stale preferences — if a new episodic memory contradicts a previously stored preference, the old one is updated or deleted.
✅ After user states a preference:
✅ When no preference stated:
Step 4: Injecting Memory into the Orchestrator
At the start of each new session, the Orchestrator reads both memory types and injects them into its system prompt. Semantic memory comes first because it contains the most actionable, durable preferences.
## User Preferences (Semantic Memory)
This is a distilled fact-sheet of what is consistently true about this user,
built up from all their past sessions. You must apply these preferences
automatically — do not ask the user to repeat them.
[Semantic Memory #3 · 2026-06-02 18:30:00]
## User Preferences
- The user prefers content under 300 words.
- The user wants bullet points instead of prose.
## Recent Session History (Episodic Memory)
Summaries of the user's most recent sessions. Use these to understand what
they have been working on and to avoid repeating work unnecessarily.
[Session abc123 · 2026-06-01 14:22:00]
The user asked for a blog post about their best-selling electronics...The Full Memory Flow (Example)
Session 1: No preference stated
Semantic: "(No explicit preferences recorded yet.)"
Session 1 continued: User states a preference
Semantic updated: "## User Preferences\n- The user prefers bullet-point format over prose."
Session 2: Preference applied automatically
→ Orchestrator reads semantic memory → passes bullet-point preference to Writer Agent automatically
Chapter 3: Swarm Architecture
In Chapters 1 and 2 we built an Orchestrator that drives a fixed pipeline of sub-agents and remembers user preferences across sessions. The Orchestrator is a hub-and-spoke model: one central agent holds all the context, decides what to do next, and delegates to specialists one at a time.
This chapter introduces a fundamentally different architecture: the Swarm. Instead of a central boss, you have a team of autonomous specialists that hand off control to each other directly — no middleman required.
Architecture
User prompt
↓
POST /api/swarm { prompt, runId, userId }
↓
Swarm Loop (app/api/swarm/route.ts)
│
├── Start: researcher (first prompt) or last active agent (follow-up)
│
├── 🔍 Researcher Agent
│ ├── Queries business database via MCP (inventory, customers, sales)
│ ├── Writes research findings to chat_sessions.topics (e.g. "research_v0")
│ └── Calls handoff(writer, summary, instructions, readTopics)
│
├── ✍️ Writer Agent
│ ├── Reads research from topics via list_topics() / read_topic()
│ ├── Writes blog post draft to topics (e.g. "draft_v0")
│ └── Calls handoff(editor, summary, instructions, readTopics)
│
└── 📝 Editor Agent
├── Reads draft from topics via list_topics() / read_topic()
├── Writes polished article to topics (e.g. "final_v0")
└── Responds with text (no handoff = done)
↓
Final response streamed to browser via SSEThe Handoff Tool
The core primitive of the swarm is the handoff tool. Every agent has access to it. When an agent calls handoff(), it saves its output to a named topic and passes control to the next agent with instructions and a list of topic names to read.
// lib/agents/swarm/tools.ts
export function buildHandoffTool({ agentName, onHandoff }) {
const config = SWARM_AGENT_CONFIG[agentName];
return tool({
description: "Hand off control to another agent. Call this when your work is done.",
inputSchema: z.object({
agentName: z.enum(config.handoffs),
summary: z.string().describe("A brief summary of what you did and why you are handing off."),
instructions: z.string().describe("Clear instructions for the next agent."),
readTopics: z.array(z.string()).describe("Named topic slots the next agent should read."),
}),
execute: async ({ agentName: nextAgent, summary, instructions, readTopics }) => {
onHandoff({ nextAgent, summary, instructions, readTopics });
return `Handing off to ${nextAgent}.`;
},
});
}The Swarm Loop
// app/api/swarm/route.ts
// Determine starting agent:
// - First prompt: start at "researcher"
// - Follow-up: resume from the last agent that finished (from registry)
const lastFinished = getLastFinishedAgent(runId);
let agentName = lastFinished ? lastFinished : "researcher";
while (hops < MAX_HOPS) {
hops++;
const result = await runSwarmAgent({ db, runId, agentName, input });
// After each agent turn, send the updated messages snapshot via SSE
send({ type: "messages", messages: getMessages(runId) });
if (result.type === "done") break;
agentName = result.nextAgent;
input = { instructions: result.instructions, readTopics: result.readTopics };
}Orchestrator vs. Swarm: Analysis
Orchestrator (Hub-and-Spoke)
→ Writer
→ Editor
Advantages
- ✅ Single-purpose agents — easy to design and test
- ✅ Predictable — Orchestrator controls the sequence
- ✅ Easy to debug — one agent makes all routing decisions
- ✅ Centralised context — Orchestrator sees everything
Disadvantages
- ❌ Context bloat — sub-agent outputs pass through Orchestrator
- ❌ Single point of failure
- ❌ No memory of active agent — follow-ups always re-route through manager
Swarm (Peer-to-Peer)
↑ ↓
└──────────────────────────────┘
Advantages
- ✅ No context bloat — each agent only sees its own context
- ✅ Resilient — no single point of failure
- ✅ Active agent memory — follow-ups resume from last active agent
- ✅ Scalable — add new specialists easily
Disadvantages
- ❌ Agents are more complex — must make routing decisions too
- ❌ Harder to predict — each agent decides its own next step
- ❌ Harder to debug — inspect each agent's decision individually
- ❌ Potential for loops without topology constraints
💡 When to Use Each
Start with the Orchestrator. Don't reach for a swarm unless you have a specific reason to.
| Problem | Why swarm helps |
|---|---|
| Orchestrator's context window is growing too large | Each swarm agent only sees its own context — no central accumulation |
| Follow-up messages need to resume with the last active specialist | The agent_registry JSON tracks the active agent; follow-ups go directly to them |
| Many specialists with complex, dynamic routing | Each agent decides its own next step based on what it knows |
Example: Multi-Turn Conversation
Turn 1: "Write a blog post about our best-selling electronics"
Done. Final article written to topic: final_v0
Turn 2: "Who bought the USB-C Hub?"
Researcher queries database → responds with buyer list
Turn 3: "OK add this info to the blog"
Done. Updated article written to topic: final_v1
Chapter 4: State Checkpointing
In Chapters 1–3 we built an Orchestrator, added long-term memory, and explored a swarm architecture. Every run was a one-way trip: the agents did their work, wrote topics to the database, and the session ended. If you wanted to change something mid-run, you had to start over from scratch.
This chapter adds state checkpointing: the ability to snapshot the conversation state before every step, roll back to any snapshot, and re-run the pipeline from that point — optionally with a new prompt. This is sometimes called time travel debugging for AI agents.
Why Checkpointing?
Long-running agent pipelines are expensive. A full Researcher → Writer → Editor run can take 30–60 seconds and cost several cents in API calls. If the Writer produces a draft you don't like, you shouldn't have to re-run the Researcher. You should be able to roll back to just before the Writer ran and give it different instructions.
The key insight is that the entire state of an agent pipeline is just the messages JSON array and the topics JSON object stored in chat_sessions. If you can snapshot both before each step and restore them on demand, you get full time travel for free.
Architecture
User prompt
↓
POST /api/orchestrator/checkpoints/start { prompt, runId, userId }
↓
runOrchestratorAgent (lib/agents/orchestrator/checkpoints.ts)
│
├── checkpointBeforeMessage() ← snapshot before user message
├── initChatSession() ← write user message to DB
│
└── runOrchestratorCore()
│
├── streamText({ abortSignal: req.signal, ... })
│
├── experimental_onToolCallStart:
│ └── checkpointBeforeMessage() ← snapshot before tool call
│
├── onStepFinish (tool call step):
│ ├── saveAssistantMessage()
│ ├── saveToolCallMessage()
│ └── saveToolMessage()
│
├── onAbort:
│ └── agentMcpClient.close() ← clean up on stop
│
└── onFinish:
├── checkpointBeforeMessage() ← snapshot before assistant reply
└── saveAssistantMessage()Schema
No new tables are added. Checkpoints are stored as a JSON array in a new checkpoints column on the existing chat_sessions table:
interface StoredCheckpoint {
message_id: string; // short UUID (first 8 chars) — also the id of the next message
messages_snapshot: StoredMessage[]; // full copy of messages at this point
topics_snapshot: Record<string, unknown>; // full copy of topics at this point
created_at: string;
}Step 1: Saving Checkpoints
Checkpoints are saved at three points in the pipeline. The checkpoint id and the id of the next message written are always the same value — checkpointBeforeMessage generates the UUID, saves the checkpoint, and returns the UUID so the caller can pass it to the message writer.
Step 2: Why experimental_onToolCallStart?
The AI SDK fires experimental_onToolCallStart before the tool executes — which is exactly when we need to snapshot the state. onStepFinish fires after the tool has already run and returned its result, so it is too late to checkpoint the pre-tool state there.
Step 3: Restoring a Checkpoint
// lib/chat-session.ts
export function restoreCheckpoint(db, sessionId, messageId) {
const checkpoint = checkpoints.find((cp) => cp.message_id === messageId);
if (!checkpoint) return null;
// Restore messages AND topics from the snapshot.
db.prepare(
"UPDATE chat_sessions SET messages = ?, topics = ?, updated_at = datetime('now') WHERE id = ?"
).run(
JSON.stringify(checkpoint.messages_snapshot),
JSON.stringify(checkpoint.topics_snapshot),
sessionId,
);
return storedMessagesToModelMessages(checkpoint.messages_snapshot);
}⚠️ Topics Are Also Snapshotted
Each checkpoint captures both messages_snapshot and topics_snapshot. This is necessary because topics are written by sub-agents during the pipeline — rolling back only the messages without rolling back the topics would leave the session in an inconsistent state (e.g. a draft_v0 topic written by the Writer would still exist after rolling back to before the Writer ran).
Step 4: Aborting a Run
The AI SDK's abortSignal parameter lets the client cancel a stream mid-run. When the user clicks the Stop button, useCompletion's stop() function cancels the HTTP request. The DB state stays consistent because onStepFinish only fires for fully completed steps. Any step that gets aborted mid-stream simply doesn't get written to the DB — so the last checkpoint is always valid.
Example: Stop and Rerun with a New Prompt
Step 1 — Start a run:
Watch the Messages tab fill in as the Orchestrator calls Researcher, Writer, Editor.
Step 2 — Stop during the Writer:
The stream stops immediately. DB state is consistent — only fully completed steps are written.
Step 3 — Rerun from the Writer with a new prompt:
Type: "Make sure the blog is in bullet point format"
Click ▶ Run.
Result: Researcher's work is preserved. Only Writer and Editor re-run with the new instruction.
Further Exploration: Branching
The current implementation supports linear time travel: roll back to any checkpoint and re-run from that point. But because checkpoints are never deleted, the data model already supports branching — like a version control system for your agent runs. Checkpoints from the original run remain intact even after a restore, so you can always navigate back to any branch. No schema changes needed.
Chapter 5: Human-in-the-Loop (HITL)
In Chapters 1–4 we built an Orchestrator, added long-term memory, explored a swarm architecture, and added state checkpointing. Every pipeline ran autonomously from start to finish — the agent decided what to do, called the tools, and reported back.
This chapter adds Human-in-the-Loop (HITL): the ability for an agent to deliberately pause mid-pipeline and wait for a human to either approve a high-risk action or provide missing information before continuing.
Why HITL?
1. Authorization (The Gatekeeper)
The agent halts before taking a high-risk action and waits for a human to click "Approve" or "Reject".
- • Deleting records from a database
- • Updating prices or customer data
- • Sending an email to a client
- • Executing a shell command
2. Steering (The Co-Pilot)
The agent halts because it lacks context and needs the human to clarify before it can proceed correctly.
- • "I found three users named John Smith. Which one?"
- • "The product doesn't exist. Did you mean X?"
- • "This will delete 47 records. Are you sure?"
Architecture: The Two-Turn Approval Flow
[UPDATE/DELETE path]
Turn 0 — first pass:
write_topic("database-mutation_v0", <user's request>)
write_topic("user-approval_v0", "false")
database_mutator_agent(readTopics=["database-mutation_v0", "user-approval_v0"],
writeTopic="mutation-result_v0")
└── Agent finds records, writes STATUS: fail + description of what will change
read_topic("mutation-result_v0")
request_human_approval(action_summary, question_for_human)
└── ⚠️ LOOP BREAKS HERE — streamText returns to the browser
The user sees the question and types a reply
Turn 1 — user replies "yes" or "no":
write_topic("database-mutation_v1", <original request>)
write_topic("user-approval_v1", "true" or "false")
database_mutator_agent(readTopics=["database-mutation_v1", "user-approval_v1"],
writeTopic="mutation-result_v1")
└── user-approval_v1 = "true" → executes, writes STATUS: success
user-approval_v1 = "false" → writes STATUS: fail (cancelled)
Orchestrator narrates final outcomeStep 1: The HITL Tool
The key insight is that HITL is implemented as a tool that intentionally breaks the execution loop. The request_human_approval tool executes, returns a JSON payload describing the pending action, and then the Orchestrator — following its system prompt instructions — stops streaming and presents the question to the user.
// lib/agents/orchestrator/mutator-tools.ts
export const mutatorTools = {
request_human_approval: tool({
description:
"Stop execution and present a confirmation question to the user for a destructive " +
"UPDATE or DELETE operation. After calling this tool, stop and wait for the user's reply.",
inputSchema: z.object({
action_summary: z.string().describe("A clear description of what records will be modified."),
question_for_human: z.string().describe("The confirmation question to present to the user."),
}),
execute: async ({ action_summary, question_for_human }) => {
return JSON.stringify({
status: "awaiting_human_approval",
action_summary,
question_for_human,
instructions: "Present the question_for_human to the user and stop.",
});
},
}),
};Step 2: The Database Mutator Agent
The database_mutator_agent is the specialist that actually reads and writes the database. Its system prompt encodes the approval logic:
The Stateless Advantage
Other frameworks (LangGraph, AutoGen, CrewAI) implement HITL by pausing a running process and waiting for a signal to resume. This requires a persistent state store, a background worker, and a mechanism to wake the process back up.
The Vercel AI SDK approach is different: the "pause" is just the HTTP stream ending, and the "resume" is the next HTTP request. There is no running process to keep alive, no state to serialize, no worker to wake up. The entire conversation state lives in the database as a JSON array of messages.
✅ Benefits of Stateless HITL
- • Scales horizontally — any server instance can handle any turn
- • Survives restarts — conversation resumes exactly where it left off
- • No timeouts — the "pause" can last indefinitely (hours, days)
- • Easy to inspect — every turn is a normal HTTP request you can replay
Example: The Four Suggestion Prompts
1. "Add a new product: Bluetooth Speaker, category Electronics, price $79.99"
Steering pattern — INSERT, but supplier field is missing
User: "AudioWorld"
Agent: INSERT succeeds → STATUS: success
2. "Update the price of the USB-C Hub to $27.99"
Authorization pattern — UPDATE requires approval
User: "yes"
Agent: UPDATE executes → STATUS: success
3. "Delete all sales records older than 2026-02-01"
Authorization pattern — bulk DELETE requires approval
User: "no"
Agent: Cancelled → STATUS: fail (cancelled)
4. "Delete the product 'Gaming Chair' from inventory"
Not found edge case — no confirmation needed
Conclusion
Across five chapters we built a complete AI agent system from the ground up — an Orchestrator driving specialist sub-agents, long-term memory, a swarm architecture, state checkpointing, and human-in-the-loop controls. Each chapter adds a capability that makes the system more production-ready.
Key Lessons
| Lesson | Takeaway |
|---|---|
| Long-term memory | Episodic + semantic memory turns a stateless tool into a learning system. Only record explicitly stated preferences — never infer. |
| Swarm vs. Orchestrator | Single-purpose agents connected to an Orchestrator are simpler, more deterministic, and easier to maintain. Scale with hierarchies of Orchestrators, not peer-to-peer swarms. |
| State checkpointing | Snapshot messages + topics before every step. Enables mid-run stops, rollbacks, and instruction injection without re-running the full pipeline. |
| HITL | Implement as a tool, not middleware. Covers both authorization (gatekeeper for destructive actions) and steering (co-pilot for missing information). |
| Stateless architecture | All state in the database. No running processes between turns. Scales horizontally, survives restarts, and supports indefinite pauses. |
| How to build | Start with simple, single-purpose agents. Connect to an Orchestrator. Scale with hierarchies. Add memory, checkpointing, and HITL only when you have a concrete reason to. |
Learning Outcomes
By working through this masterclass, you will have gained practical experience with:
- • Adding episodic and semantic long-term memory to an Orchestrator agent
- • Building a swarm architecture where agents hand off control to each other directly
- • Understanding when to use an Orchestrator vs. a swarm (and why to prefer the Orchestrator)
- • Implementing state checkpointing with full time travel debugging for agent pipelines
- • Building human-in-the-loop controls as a tool — not middleware — for both authorization and steering
- • Designing stateless, horizontally scalable agent pipelines that survive restarts and support indefinite pauses
About the Author
Wayne Cheng is the founder and AI app developer at Audoir, LLC. Prior to founding Audoir, he worked as a hardware design engineer for Silicon Valley startups and an audio engineer for creative organizations. He holds an MSEE from UC Davis and a Music Technology degree from Foothill College.
Further Exploration
Explore the complete masterclass repository and experiment with extending the examples. Consider adding new specialist agents, implementing branching checkpoints with a visual tree UI, or extending HITL with structured approval buttons instead of plain text replies.
New to advanced AI agents? Start with the Advanced AI Agent Tutorial first, which covers multi-agent systems, OpenTelemetry observability, evals, and data pipeline theory.
For more AI-powered development tools and tutorials, visit Audoir .