AI Architecture Patterns.

Four foundational patterns that power every serious AI deployment.

After this lesson you'll know

How to implement Gateway, Router, Orchestrator, and Pipeline patterns
When to use each pattern and how to combine them
Trade-offs between simplicity, flexibility, and operational cost
Real-world examples from production systems at scale

Pattern 1: The Gateway

The Gateway pattern is your front door. Every request enters through a single point that handles authentication, rate limiting, input validation, and request normalization before anything touches a model. ```python class AIGateway: def __init__(self, rate_limiter, validator, auth): self.rate_limiter = rate_limiter self.validator = validator self.auth = auth async def handle(self, request): # Step 1: Authenticate user = self.auth.verify(request.token) # Step 2: Rate limit per user tier if not self.rate_limiter.allow(user.tier, user.id): raise RateLimitError(retry_after=self.rate_limiter.retry_after(user.id)) # Step 3: Validate and sanitize input clean_input = self.validator.sanitize(request.body) if len(clean_input.tokens) > user.tier.max_tokens: raise TokenLimitError(max=user.tier.max_tokens) # Step 4: Forward to processing layer return await self.processor.process(clean_input, user) ``` The Gateway never makes AI decisions. It protects the system, normalizes inputs, and enforces contracts. Think of it as the bouncer -- it decides who gets in, not what happens inside.

Production tip: Always validate token counts at the gateway, not at the model call. By the time you hit an API token limit, you've already consumed compute for preprocessing. Fail fast, fail cheap.

Pattern 2: The Router

The Router pattern directs requests to the right handler based on intent, complexity, or content type. This is where you stop paying GPT-4 prices for GPT-3.5 work. ```python class ModelRouter: ROUTES = { "simple_faq": {"model": "gpt-4o-mini", "max_tokens": 200}, "technical": {"model": "claude-sonnet-4-20250514", "max_tokens": 2000}, "complex": {"model": "claude-opus-4-20250514", "max_tokens": 4000}, "code_gen": {"model": "claude-sonnet-4-20250514", "max_tokens": 8000}, } async def route(self, request): # Classify with a fast, cheap model intent = await self.classifier.classify(request.text) # Select configuration config = self.ROUTES.get(intent, self.ROUTES["complex"]) # Check cache before calling model cached = await self.cache.get(request.text, intent) if cached: return cached return await self.call_model(request, config) ``` The classifier itself is typically a fine-tuned small model or even a rules-based system. You do not need AI to decide which AI to use -- a regex or keyword match handles 70% of routing decisions.

Cost impact: Proper routing typically reduces API costs by 40-60%. A team at a Series B startup reported dropping from $12K/month to $4.8K/month by routing simple queries to GPT-4o-mini instead of sending everything through Claude Opus.

🔒

This lesson is for Pro members

Unlock all 518+ lessons across 52 courses with Academy Pro.

Go Pro — $49/mo ← Back to course

Already a member? Sign in to access your lessons.