Building Your Complete Infrastructure
Nine lessons of theory and technique. Now it's time to put it all together into a production-ready AI infrastructure stack — one that's secure, cost-efficient, observable, and ready to scale.
What you'll learn
- How to assemble a complete AI infrastructure from the ground up
- A reference architecture you can adapt to your own project
- The order of operations for building each layer
- Common pitfalls and how to avoid them
Reference Architecture
Here's the complete stack, layer by layer. This is the architecture that Like One runs on — proven in production, affordable for indie developers, and scalable when growth demands it.
Frontend: Next.js on Vercel. Auto-deploys from GitHub. Edge middleware for auth checks and rate limiting. Streaming responses for AI-generated content so users see results immediately.
Backend / API: Supabase edge functions. Serverless, auto-scaling, close to the database. These handle AI orchestration — receiving requests, checking caches, calling providers, and returning results.
Database: PostgreSQL on Supabase with pgvector enabled. One database for application data, vector embeddings, operation logs, and cached responses. Row-level security for multi-tenant isolation.
AI Layer: Tiered provider setup. Free embeddings via HuggingFace. Mid-tier model for simple tasks. Flagship model for complex reasoning. Semantic cache in front of everything.
Monitoring: Structured logs in a dedicated Supabase table. Cost tracking per operation. Alerts via cron-triggered edge functions to Slack or email.
What to Build First
Don't try to build everything at once. This is the order that minimizes rework and gets you to production fastest.
Week 1: Foundation. Set up your Vercel project and Supabase database. Deploy a basic app that serves pages. Confirm your CI/CD pipeline works — push to main, see it live.
Week 2: AI Integration. Add your first AI API call through a Supabase edge function. Store the API key in environment variables. Add basic logging — every call writes to your operations log table.
Week 3: Vector Search. Enable pgvector. Create your embeddings table. Build a basic RAG pipeline: embed content, store vectors, query by similarity, inject context into your AI prompts.
Week 4: Hardening. Add rate limiting, input validation, and output checking. Implement response caching. Set up cost alerts. Write your first post-deploy smoke test.
After four weeks, you have a production-grade AI infrastructure. Everything after this is optimization and scaling — which you do when you need it, not before.
Complete Edge Function — AI Orchestrator
Here's a production-ready Supabase edge function that ties together everything from this course: rate limiting, caching, RAG, provider fallback, cost logging, and input validation — all in one function.
import { createClient } from "@supabase/supabase-js";
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
);
Deno.serve(async (req) => {
const startTime = Date.now();
const { message, userId } = await req.json();
// 1. INPUT VALIDATION
if (!message || message.length > 10_000) {
return new Response(JSON.stringify({ error: "Invalid input" }), { status: 400 });
}
const injectionPatterns = [
/ignore\s+previous\s+instructions/i,
/reveal\s+(your|the)\s+prompt/i,
];
if (injectionPatterns.some(p => p.test(message))) {
return new Response(JSON.stringify({ error: "Invalid request" }), { status: 400 });
}
// 2. RATE LIMITING
const windowStart = new Date(Date.now() - 3600_000).toISOString();
const { count } = await supabase
.from("ai_api_calls")
.select("*", { count: "exact", head: true })
.eq("user_id", userId)
.gte("created_at", windowStart);
if ((count ?? 0) >= 20) {
return new Response(
JSON.stringify({ error: "Rate limit exceeded. Try again later." }),
{ status: 429 }
);
}
// 3. SEMANTIC CACHE CHECK
const embedRes = await fetch(
"https://api-inference.huggingface.co/pipeline/feature-extraction/BAAI/bge-small-en-v1.5",
{
method: "POST",
headers: { Authorization: `Bearer ${Deno.env.get("HF_TOKEN")}` },
body: JSON.stringify({ inputs: message }),
}
);
const embedding = await embedRes.json();
const { data: cached } = await supabase.rpc("find_cached_response", {
query_vec: JSON.stringify(embedding),
similarity_threshold: 0.95,
});
if (cached?.length > 0) {
await logOperation(userId, "cache_hit", 0, 0, Date.now() - startTime, 0);
return new Response(JSON.stringify({ response: cached[0].response_text }));
}
// 4. RAG RETRIEVAL
const { data: docs } = await supabase.rpc("match_documents", {
query_embedding: embedding,
match_threshold: 0.7,
match_count: 5,
});
const context = docs?.map((d: any) => d.content).join("\n\n") ?? "";
// 5. LLM CALL WITH FALLBACK
let response: string;
let provider = "anthropic";
try {
response = await callClaude(message, context);
} catch {
provider = "openai";
try {
response = await callGPT(message, context);
} catch {
response = "All AI providers are currently unavailable. Please try again.";
provider = "none";
}
}
// 6. COST LOGGING
const latency = Date.now() - startTime;
const inputTokens = Math.ceil((message.length + context.length) / 4);
const outputTokens = Math.ceil(response.length / 4);
const cost = provider === "anthropic"
? (inputTokens * 3 + outputTokens * 15) / 1_000_000
: (inputTokens * 2.5 + outputTokens * 10) / 1_000_000;
await logOperation(userId, provider, inputTokens, outputTokens, latency, cost);
// 7. CACHE THE RESPONSE
if (provider !== "none") {
await supabase.from("semantic_cache").insert({
query_text: message,
query_embedding: JSON.stringify(embedding),
response_text: response,
provider, model: provider === "anthropic" ? "claude-sonnet" : "gpt-4o",
});
}
return new Response(JSON.stringify({ response, provider, latency }));
});
async function logOperation(
userId: string, provider: string, input: number,
output: number, latency: number, cost: number
) {
await supabase.from("ai_api_calls").insert({
user_id: userId, provider, input_tokens: input,
output_tokens: output, latency_ms: latency,
estimated_cost: cost, status: "success",
});
}
This single function implements 7 of the 10 lessons in this course. Study it, understand each layer, then adapt it for your own project. The patterns are the same regardless of your specific use case — input validation, rate limiting, caching, RAG, provider fallback, cost logging, and response caching.
This lesson is for Pro members
Unlock all 520+ lessons across 52 courses with Academy Pro.
Already a member? Sign in to access your lessons.