Tracking AI System Health

An AI system can be "up" and still be broken — returning hallucinated answers, burning through budget, or degrading silently. Monitoring AI requires watching things traditional observability tools don't track.

What you'll learn

What to monitor in AI systems beyond uptime
Building dashboards that catch AI-specific failures
Logging strategies for debugging AI pipelines
Setting up alerts that actually tell you something useful

Beyond Uptime

What Makes AI Monitoring Different

Traditional monitoring asks: Is the server up? Is latency acceptable? Are error rates normal? AI monitoring asks all of that plus: Are the responses accurate? Is the model behaving as expected? Are we spending more than we should?

A 200 OK response from your AI endpoint might contain complete nonsense. Your monitoring needs to catch that. This is the fundamental difference — in AI systems, "working" and "working correctly" are two very different things.

The Metrics

What to Track

Latency per AI call: Track p50, p95, and p99 latency for every AI provider call. LLM responses can vary from 500ms to 30 seconds — know your distribution.

Token usage: Log input tokens, output tokens, and total tokens for every call. This directly maps to cost and helps you identify expensive prompts or unexpectedly verbose responses.

Cost per request: Calculate and log the actual dollar cost of each AI operation. Aggregate by user, feature, and time period.

Error rates by provider: Track 4xx and 5xx responses from each AI provider separately. If one provider's error rate spikes, you want to know immediately — especially if you have fallback logic.

Cache hit rates: If you're caching AI responses (you should be for common queries), track how often the cache serves a response vs. making a fresh API call. Low cache hit rates mean you're spending more than necessary.

Response quality signals: Track user feedback (thumbs up/down), response length anomalies, and any automated quality checks you run on outputs.

The Logs

Structured Logging for AI Pipelines

Every AI operation should produce a structured log entry with: timestamp, user ID, function name, provider, model, input token count, output token count, latency in milliseconds, estimated cost, and success/failure status.

For debugging, also log the prompt template used (not the full prompt — that may contain user data) and any retrieval context that was injected (document IDs, similarity scores). When something goes wrong, you need to reconstruct the full pipeline state.

Store logs in a queryable format. Supabase tables work well for this — you get full SQL query power over your AI operation logs. For higher volume, consider a dedicated logging service like Datadog or a simple time-series database.

🔒

This lesson is for Pro members

Unlock all 520+ lessons across 52 courses with Academy Pro.

Go Pro — $49/mo ← Back to course

Already a member? Sign in to access your lessons.