Technology

Advanced AI Agents: Multi-Agent Systems, Observability, Evals & Data Pipelines

A hands-on Next.js project that walks you through advanced AI concepts — multi-agent systems with MCP, distributed tracing with OpenTelemetry, building a native eval framework, agent topics for context-free pipelines, and the theory behind AI agent data pipelines.

Wayne Cheng

25 min read

Published

Complete Tutorial Code

Follow along with the complete source code for this advanced AI agent tutorial. Includes five chapters covering multi-agent systems, OpenTelemetry tracing, evals, agent topics, and data pipeline theory.

View on GitHub

⚠️

Prerequisites

This tutorial is a continuation of the AI Agent Tutorial, which covers building AI agents from scratch — from a simple streaming chatbot to a LangGraph-powered agent with database tools served over MCP. Complete that tutorial first before proceeding here.

Introduction

Once you have the fundamentals of AI agents down — streaming chatbots, tool-calling, MCP, and LangGraph — the next frontier is building systems that are robust, observable, and scalable. This tutorial picks up where the AI Agent Tutorial left off and introduces five advanced concepts that separate production-grade AI systems from prototypes.

You'll build a multi-agent content pipeline where an Orchestrator Agent coordinates a Researcher, Writer, and Editor — all via MCP. Then you'll add OpenTelemetry tracing to see exactly what's happening inside, build a native eval framework to measure quality, refactor the pipeline to use database-backed "agent topics" to eliminate context bloat, and finally explore the theory behind AI agent data pipelines.

Tutorial Overview

The tutorial is structured as five chapters, each building on the previous concepts while introducing new capabilities:

Multi-Agent Systems

Build an Orchestrator Agent that dynamically coordinates three specialist agents (Researcher, Writer, Editor) via two stacked MCP servers. The Orchestrator reasons about which agents to call and in what order — no hardcoded pipelines.

POST /api/multi-agentAgent MCPDatabase MCPstreamText

Observability with OpenTelemetry

Add distributed tracing to the multi-agent system using OpenTelemetry and Jaeger. See every LLM call, tool invocation, and agent hop in a single unified trace waterfall — including token counts and latency per step.

OpenTelemetryJaegerOTLP/HTTPexperimental_telemetry

Evals for AI Agents

Build a native eval framework — no third-party tools — with a dataset, runner, scorer, and LLM-as-judge. Three eval suites (researcher, pipeline, safety) measure quality across 12 test cases with composable code-based checker functions.

DatasetRunnerScorerLLM-as-judge

Agent Topics

Refactor the pipeline to use database-backed "agent topics" — named slots where agents write their output instead of passing large strings through the Orchestrator. Inspired by pub/sub messaging, this pattern eliminates context bloat and makes pipelines resumable and inspectable.

agent_topics tablerunIdreadTopic / writeTopicPub/Sub pattern

Data Pipelines for AI Agents

A conceptual deep-dive into how AI agent pipelines differ from traditional ETL. Covers the six stages of an AI agent data pipeline — ingestion, processing, storage, retrieval (RAG), reasoning, and the action/feedback loop — and how each chapter in this tutorial implements a piece of it.

ETL vs. AI PipelinesRAGVector DatabasesMemory Architecture

Prerequisites and Setup

Before diving into the examples, ensure you have the necessary tools and environment configured:

Requirements

Node.js (v18 or higher)
OpenAI API key
Completed the AI Agent Tutorial
Docker (for Chapter 2 — Jaeger)

Installation Steps

1
Clone the repository:
git clone https://github.com/audoir/advanced-ai-tutorial.git
2
Install dependencies:
npm install
3
Configure environment:
OPENAI_API_KEY=sk-...
Create a .env.local file with your OpenAI API key
4
Start the dev server:
npm run dev
Open http://localhost:3000 in your browser.

Chapter 1: Multi-Agent Systems

A multi-agent system is a setup where multiple AI agents — each with a specialized role — collaborate to complete a task that would be difficult for a single agent to do well alone. Rather than hardcoding the order of agent calls, an Orchestrator Agent dynamically decides which specialist agents to invoke, in what order, and what to pass between them — all via MCP tool calls.

Single-Agent: Advantages

✅ Simple to build and debug
✅ Low latency — no coordination overhead
✅ Cheap — fewer LLM calls
✅ Predictable behavior

Single-Agent: Disadvantages

❌ Context window limits
❌ Jack of all trades, master of none
❌ Hard to parallelize
❌ Brittle for complex tasks

Multi-Agent: Advantages

✅ Specialization — higher quality per step
✅ Parallelism — independent agents run concurrently
✅ Scalability — add a new agent, not a new prompt
✅ Dynamic orchestration — adapts at runtime

Multi-Agent: Disadvantages

❌ More complexity and moving parts
❌ Higher latency — each hop adds time
❌ Higher cost — more LLM calls
❌ Error propagation across agents

Architecture: Two Layers of MCP

This chapter uses two MCP servers stacked together. The Orchestrator Agent connects to the Agent MCP server, which exposes three specialist agents as tools. The Researcher Agent (running inside the Agent MCP server) connects to the Database MCP server to query real data.

User prompt
    ↓
POST /api/multi-agent
    ↓
🤖 Orchestrator Agent  (gpt + agent MCP tools)
    │
    ├── calls researcher_agent(topic)
    │       └── Researcher Agent queries Database MCP (SQL tools)
    │           → returns research report
    │
    ├── calls writer_agent(topic, research)
    │       └── Writer Agent drafts a blog post
    │           → returns article draft
    │
    └── calls editor_agent(draft)
            └── Editor Agent reviews and polishes
                → returns final article + editorial notes
    ↓
Orchestrator synthesizes and streams final response

The Specialist Agents

🔍 researcher_agent — Queries the SQLite database via MCP, returns a data report

✍️ writer_agent — Takes topic + research, writes a 400–600 word blog post draft

📝 editor_agent — Reviews the draft, returns editorial feedback + polished final article

Why Orchestrator + MCP Instead of a Sequential Pipeline?

A traditional sequential pipeline hardcodes the order: researcher() → writer(research) → editor(draft). The Orchestrator Agent via MCP reasons about which tools to call and in what order. It can adapt — calling the researcher twice if the first result is insufficient, or skipping the editor for a simple summary request.

Chapter 2: Observability with OpenTelemetry

Observability is the ability to understand what's happening inside your system by examining its outputs. For AI agent systems, it's especially important because a single user request can trigger 5–15 LLM calls across multiple agents, and without tracing you can't tell which step is slow, which agent is burning the most tokens, or where a failure originated.

This chapter adds OpenTelemetry (OTel) tracing to the multi-agent system and visualizes the traces in Jaeger. The AI SDK's experimental_telemetry option emits spans automatically for every generateText and streamText call.

# Start Jaeger with Docker
docker run --rm --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  cr.jaegertracing.io/jaegertracing/jaeger:2.18.0

# Then open http://localhost:16686 to view traces

What You'll See in Jaeger

With manual context propagation, all four agents produce one unified trace waterfall:

▼ orchestrator.handleRequest                    12.4s
  ▼ ai.streamText  [orchestrator-agent]          12.4s
      ai.toolCall: researcher_agent               5.1s
      ai.toolCall: writer_agent                   4.2s
      ai.toolCall: editor_agent                   3.1s
  ▼ researcher_agent.run                          5.1s
    ▼ ai.generateText  [researcher-agent]         5.0s
        ai.toolCall: inventory                    0.08s
        ai.toolCall: sales                        0.09s
  ▼ writer_agent.run                              4.2s
    ▼ ai.generateText  [writer-agent]             4.2s
  ▼ editor_agent.run                              3.1s
    ▼ ai.generateText  [editor-agent]             3.1s

Enabling Telemetry on AI SDK Calls

The AI SDK's telemetry is opt-in per call via the experimental_telemetry option:

const result = streamText({
  model: openai(DEFAULT_MODEL),
  experimental_telemetry: {
    isEnabled: true,
    functionId: "orchestrator-agent",   // shown as resource.name in Jaeger
    metadata: { sessionId },            // custom attributes on the span
  },
  // ... rest of options
});

Fixing Orphaned Spans with Context Propagation

The specialist agents are invoked via HTTP fetch calls through the MCP protocol. The OTel SDK's automatic HTTP instrumentation only patches Node's built-in http/https modules — it does not patch the Web fetch API. Without manual context propagation, each agent produces an orphaned trace with no parent. The fix is two-sided:

Orchestrator: Inject Context

// Serialize OTel context into headers
const traceCarrier: Record<string, string> = {};
propagation.inject(context.active(), traceCarrier);

const agentMcpClient = await createMCPClient({
  transport: {
    type: "http",
    url: "http://localhost:3000/api/mcp/agents-otel/mcp",
    headers: traceCarrier,  // ← inject traceparent
  },
});

Agent MCP: Extract Context

// Extract parent context from request headers
const carrier: Record<string, string> = {};
request.headers.forEach((v, k) => { carrier[k] = v; });
const parentContext = propagation.extract(
  context.active(), carrier
);

// Restore parent context for all child spans
return context.with(parentContext, () =>
  tracer.startActiveSpan("researcher_agent.run", ...)
);

Chapter 3: Evals for AI Agents

Evaluations (evals) are systematic tests that measure how well AI models and agents perform at specific tasks. Unlike traditional unit tests, the underlying system is non-deterministic — outputs can vary between runs. Evals are designed to test systems robustly when outputs aren't perfectly consistent.

This chapter builds a native eval framework — no third-party tools — with three components:

1. Dataset
`lib/evals/dataset.ts`

12 test cases across 3 suites. Each case has an input, description, and an array of composable checks (pure functions that return true/false).

2. Runner
`lib/evals/runner.ts`

Model-agnostic harness that feeds inputs to the agent and collects outputs. Two runners: one for the researcher agent directly, one for the full pipeline.

3. Scorer
`lib/evals/scorer.ts`

Code-based scoring (fast, deterministic, zero cost) plus opt-in LLM-as-judge scoring for qualitative dimensions like relevance, accuracy, coherence, and completeness.

The Three Eval Suites

🔍 Researcher Suite (5 tests)

Tests the researcher agent's ability to query the database and return factual data. Fast — calls the researcher directly, no writer or editor.

📝 Pipeline Suite (4 tests)

End-to-end tests of the full orchestrator pipeline (researcher → writer → editor). Verifies the complete content generation workflow including markdown structure and length.

🛡️ Safety Suite (3 tests)

Tests that the agent refuses or redirects harmful and off-topic requests — SQL injection attempts, off-topic queries, and hallucination prevention.

Built-in Scorer Helper Functions

// Composable scorer helpers — each returns (output: string) => boolean
containsAll("electronics", "revenue")   // output must contain ALL keywords
containsAny("keyboard", "headphone")    // output must contain AT LEAST ONE
containsNone("table dropped")           // output must NOT contain any
matchesRegex(/\$[\d,]+\.\d{2}/)        // output must match the regex
lengthBetween(500, 8000)                // output length must be in range
hasMarkdownHeadings(2)                  // output must have ≥ 2 ## headings
containsDollarAmount()                  // output must contain $12.99-style amount
containsNumber()                        // output must contain a number

Chapter 4: Agent Topics

In Chapter 1, the Orchestrator passes the full output of each agent as a string argument to the next agent. This works, but it causes context window bloat — the full research report is copy-pasted into the Orchestrator's context before being passed to the writer, burning tokens fast.

An agent topic is a named slot in the database where an agent writes its output. The next agent reads from that topic directly — it doesn't receive the data as a function argument. This pattern is directly inspired by pub/sub messaging systems like Kafka or Redis Pub/Sub.

Chapter 1 — Orchestrator passes full content

// Orchestrator context grows with each step:
[tool call: researcher_agent("electronics")]
[tool result: "## Research Report

...(2,000 chars)..."]
[tool call: writer_agent("electronics",
  "## Research Report

...(2,000 chars pasted again)...")]
[tool result: "# The Electronics Revolution

...(3,000 chars)..."]
[tool call: editor_agent(
  "# The Electronics Revolution

...(3,000 chars pasted again)...")]

Chapter 4 — Orchestrator passes runId + topic names

// Orchestrator context stays small:
[tool call: researcher_agent("electronics", "run_abc",
  writeTopic="research")]
[tool result: "Research complete. Written to topic
  research:run_abc (1842 chars)"]
[tool call: writer_agent("electronics", "run_abc",
  readTopic="research", writeTopic="draft")]
[tool result: "Draft complete. Written to topic
  draft:run_abc (2931 chars)"]
[tool call: editor_agent("run_abc",
  readTopic="draft", writeTopic="final")]
[tool result: "Editing complete. Written to topic
  final:run_abc (3204 chars)"]

Benefits of Agent Topics

✓No context bloat — only the topic ID (a short string) is passed between agents

✓Persistence — every intermediate output is stored in SQLite

✓Resumability — a failed pipeline can restart from the last successful topic write

✓Inspectability — any topic can be queried at any time with SQL

✓Fan-out / fan-in — multiple agents can read from the same topic, or one agent can read from multiple topics

✓Decoupling — agents only know which topic to read from and write to

Chapter 5: Data Pipelines for AI Agents

Traditional data pipelines move data for humans to analyze. AI agent pipelines move data for machines to act on. The key difference: traditional pipelines are linear and batch-oriented; AI agent pipelines are cyclical and real-time, with a continuous action/feedback loop.

The Six Stages of an AI Agent Data Pipeline

Ingest — Continuously gather data from user inputs, external APIs, document repositories, and event streams

Process — Parse, clean, chunk, and embed (vectorize) data into a format the agent can reason about

Store — Route processed data into short-term memory (Redis), long-term memory (vector DB), and operational state (SQLite / agent topics)

Retrieve (RAG) — Embed the current query, search the vector DB for semantically relevant knowledge, and construct a highly contextualized prompt

Reason — Send the contextualized prompt to the LLM; it decides whether to answer directly or call a tool

Act & Loop — Execute the tool call; ingest the result back into the pipeline; repeat until the task is complete

How This Tutorial Maps to the Pipeline

Chapter 1 — Multi-Agent Systems: Stage 4 (Retrieval) + Stage 5 (Reasoning) + Stage 6 (Action) — Researcher queries SQLite via SQL tool calls; Orchestrator coordinates via MCP

Chapter 2 — Observability: Cross-cutting — traces every stage for latency, token usage, and failure diagnosis

Chapter 3 — Evals: Quality gate between Stage 5 and Stage 6 — measures whether agent outputs meet correctness thresholds before acting

Chapter 4 — Agent Topics: Stage 3 (Storage) — persists intermediate outputs in named database slots, enabling resumable and inspectable pipelines

Key Dependencies

Vercel AI SDK & MCP

ai — generateText, streamText
@ai-sdk/openai — OpenAI provider
@ai-sdk/mcp — MCP client for the AI SDK
mcp-handler — MCP server handler for Next.js
@modelcontextprotocol/sdk — Official MCP TypeScript SDK

OpenTelemetry & Storage

@opentelemetry/sdk-node — OTel SDK for Node.js
@opentelemetry/exporter-trace-otlp-http — OTLP/HTTP exporter
better-sqlite3 — Synchronous SQLite driver
zod — Schema validation for tool inputs

Learning Outcomes

By working through this tutorial, you will have gained practical experience with:

• Building multi-agent systems with an Orchestrator and specialist agents via stacked MCP servers
• Adding distributed tracing to AI agents with OpenTelemetry and Jaeger
• Propagating OTel trace context across HTTP boundaries (fixing orphaned spans)
• Building a native eval framework with datasets, runners, scorers, and LLM-as-judge
• Implementing the agent topics pattern to eliminate context bloat in multi-agent pipelines
• Understanding the six stages of an AI agent data pipeline and how they differ from traditional ETL
• Designing resumable, inspectable, and decoupled agent pipelines

Conclusion

This tutorial demonstrates that building production-grade AI agent systems requires more than just wiring up an LLM with tools. You need observability to understand what's happening, evals to measure quality, and architectural patterns like agent topics to keep pipelines scalable and maintainable.

Each chapter in this tutorial implements a piece of the AI agent data pipeline — from retrieval and reasoning (Chapter 1) to storage and operational state (Chapter 4) — giving you a complete picture of what it takes to build AI agents that work reliably in production.

About the Author

Wayne Cheng is the founder and AI app developer at Audoir, LLC. Prior to founding Audoir, he worked as a hardware design engineer for Silicon Valley startups and an audio engineer for creative organizations. He holds an MSEE from UC Davis and a Music Technology degree from Foothill College.

Further Exploration

Explore the complete tutorial repository and experiment with extending the examples. Consider adding new specialist agents, connecting to external APIs, or implementing human-in-the-loop approval steps to deepen your understanding of advanced AI agent architectures.

New to AI agents? Start with the AI Agent Tutorial first, which covers building agents from scratch — from a simple streaming chatbot to a LangGraph-powered agent with MCP tools.

For more AI-powered development tools and tutorials, visit Audoir .