After this lesson you'll know
- How to implement Gateway, Router, Orchestrator, and Pipeline patterns
- When to use each pattern and how to combine them
- Trade-offs between simplicity, flexibility, and operational cost
- Real-world examples from production systems at scale
Pattern 1: The Gateway
AI Architecture Patterns
01GatewayAuth, rate limit, validate inputs
→02RouterDirect to the right model
→03PipelineFixed stages, predictable flow
→04OrchestratorDynamic replanning per step
Production systems compose all four -- Gateway fronts, Router dispatches, Pipelines handle known work, Orchestrators handle the unknown.
Production tip: Always validate token counts at the gateway, not at the model call. By the time you hit an API token limit, you've already consumed compute for preprocessing. Fail fast, fail cheap.
Pattern 2: The Router
The Router pattern directs requests to the right handler based on intent, complexity, or content type. This is where you stop paying GPT-4 prices for GPT-3.5 work. ```python class ModelRouter: ROUTES = { "simple_faq": {"model": "gpt-4o-mini", "max_tokens": 200}, "technical": {"model": "claude-sonnet-4-20250514", "max_tokens": 2000}, "complex": {"model": "claude-opus-4-20250514", "max_tokens": 4000}, "code_gen": {"model": "claude-sonnet-4-20250514", "max_tokens": 8000}, } async def route(self, request): # Classify with a fast, cheap model intent = await self.classifier.classify(request.text) # Select configuration config = self.ROUTES.get(intent, self.ROUTES["complex"]) # Check cache before calling model cached = await self.cache.get(request.text, intent) if cached: return cached return await self.call_model(request, config) ``` The classifier itself is typically a fine-tuned small model or even a rules-based system. You do not need AI to decide which AI to use -- a regex or keyword match handles 70% of routing decisions.
Cost impact: Proper routing typically reduces API costs by 40-60%. A team at a Series B startup reported dropping from $12K/month to $4.8K/month by routing simple queries to GPT-4o-mini instead of sending everything through Claude Opus.
Pattern 3: The Orchestrator
The Orchestrator pattern coordinates multiple AI calls and tools to complete complex tasks. Unlike a simple chain, the orchestrator makes dynamic decisions about what to do next based on intermediate results. ```python class TaskOrchestrator: async def execute(self, task): plan = await self.planner.create_plan(task) results = [] for step in plan.steps: # Execute step with appropriate tool if step.type == "retrieval": result = await self.retriever.search(step.query) elif step.type == "analysis": result = await self.analyzer.analyze(step.data, results) elif step.type == "generation": result = await self.generator.generate(step.prompt, context=results) results.append(result) # Dynamic replanning based on results if result.confidence < 0.7: plan = await self.planner.replan(task, results) return await self.synthesizer.combine(results) ``` The key differentiator is dynamic replanning. A pipeline always executes the same steps. An orchestrator adapts based on what it learns at each stage. This is the pattern behind AI agents, coding assistants, and research tools.
Guard rail: Always set a maximum iteration count on orchestrators. Without it, a confused planner can loop indefinitely, burning tokens and time. Most production systems cap at 5-10 iterations.
Pattern 4: The Pipeline
The Pipeline pattern chains deterministic stages in a fixed sequence. Each stage transforms the data and passes it forward. Unlike the orchestrator, there is no branching or replanning. ```python class DocumentPipeline: stages = [ ChunkStage(max_tokens=500, overlap=50), EmbedStage(model="text-embedding-3-small"), RetrieveStage(top_k=5, threshold=0.75), AugmentStage(template="answer_with_sources"), GenerateStage(model="claude-sonnet-4-20250514", max_tokens=2000), ValidateStage(checks=["no_hallucination", "has_citations"]), ] async def run(self, document, query): context = {"document": document, "query": query} for stage in self.stages: context = await stage.process(context) if context.get("abort"): return context["abort_reason"] return context["response"] ``` Pipelines are ideal for RAG, document processing, and any workflow where the steps are known in advance. They are simpler to debug, test, and monitor than orchestrators because each stage has a fixed input/output contract.
Combining patterns: Most production systems combine all four. A Gateway fronts the system, a Router dispatches to different Pipelines, and complex requests get an Orchestrator. The patterns are composable, not competing.
Choosing the Right Pattern
| Pattern | Best For | Complexity | Debuggability | |---------|----------|------------|---------------| | Gateway | All systems (mandatory) | Low | High | | Router | Multi-model, cost optimization | Medium | High | | Pipeline | RAG, document processing, ETL | Medium | High | | Orchestrator | Agents, complex reasoning, multi-step tasks | High | Medium | Start with Gateway + Pipeline. Add Router when costs matter. Add Orchestrator only when tasks require dynamic decision-making. Over-engineering early is the second most common failure mode after under-engineering.Quiz
1What is the primary difference between a Pipeline and an Orchestrator?
2Why should token counts be validated at the Gateway rather than at the model call?
Vocabulary
What does the Gateway pattern handle?
Authentication, rate limiting, input validation, token count enforcement, and request normalization. It protects the system and enforces contracts without making AI decisions.
How much can proper model routing reduce API costs?
Typically 40-60%. By routing simple queries to cheaper models (e.g., GPT-4o-mini instead of Opus), you avoid overpaying for tasks that don't need frontier intelligence.
What safeguard is essential for Orchestrator patterns?
A maximum iteration count. Without it, a confused planner can loop indefinitely, burning tokens and time. Production systems typically cap at 5-10 iterations.
When should you use a Pipeline vs. an Orchestrator?
Use Pipelines when steps are known in advance (RAG, document processing). Use Orchestrators when the task requires dynamic decision-making based on intermediate results (agents, complex reasoning).
How do the four patterns compose in production?
Gateway fronts the system (mandatory). Router dispatches to different Pipelines. Complex requests get an Orchestrator. The patterns are composable layers, not competing alternatives.