Managing AI API Keys, Rate Limits & Costs
Every AI API call costs money. Every leaked key is a disaster. Every rate limit hit is a broken user experience. API management isn't glamorous — it's essential.
What you'll learn
- How to store and rotate API keys securely
- Rate limiting strategies that protect your budget and your users
- Cost tracking and alerting before bills spiral
- Building resilient API calls with retries and fallbacks
API Key Management
An exposed OpenAI or Anthropic API key can rack up thousands of dollars in minutes. This isn't theoretical — it happens constantly. Keys get committed to GitHub, hardcoded in frontend JavaScript, or left in plain text configs.
Rule 1: API keys live in environment variables, never in code. Every platform (Vercel, Supabase, AWS) has a secure way to store env vars. Use it.
Rule 2: AI API calls happen server-side only. Never call OpenAI or Anthropic from the browser. Your edge function or backend makes the call; the frontend gets the result.
Rule 3: Rotate keys regularly and set spending caps. Both OpenAI and Anthropic let you set monthly limits. Set them lower than you think you need.
Rate Limiting
AI APIs have their own rate limits — requests per minute, tokens per minute, concurrent requests. But you also need your own rate limits to protect your budget from abusive or runaway usage.
Per-user limits prevent any single user from burning through your API budget. A reasonable starting point: 20 requests per hour for free users, 100 for paid users.
Global limits act as a circuit breaker. If your total API spend hits a threshold, slow down or pause non-critical calls rather than letting costs run away.
Implement rate limiting at the edge — before the request hits your AI provider. Supabase edge functions or Vercel middleware are ideal places for this logic.
Cost Tracking and Alerting
Log every AI API call with its token count and estimated cost. This data is gold — it tells you which features are expensive, which users are heavy consumers, and whether your pricing model actually works.
Set up alerts at 50%, 75%, and 90% of your monthly budget. The first alert is informational. The second triggers investigation. The third should activate automatic throttling.
Track cost per feature, not just total cost. Your chatbot might cost $0.02 per conversation while your document analysis feature costs $0.50. This insight drives product and pricing decisions.
API Management — Match Each Rule to Its Purpose
Tap one on the left, then its match on the right
Retries, Fallbacks, and Graceful Degradation
AI APIs go down. They time out. They return errors. Your app needs to handle all of this gracefully.
Exponential backoff: When a call fails, wait 1 second, then 2, then 4. Don't hammer a struggling API — you'll make it worse and get rate-limited.
Provider fallback: If Claude is down, can you fall back to GPT? If your primary vector DB is slow, do you have a cache layer? Multi-provider setups are more work but dramatically more reliable.
Graceful degradation: If all AI providers are down, your app should still function — just without AI features. Show cached responses, display a status message, or queue requests for later processing.
API Call Checklist
Before every AI API integration, verify: key stored in env var, server-side only, rate limited per user, cost logged, retry logic implemented, spending cap set, fallback defined.
Try it yourself
Build a simple edge function that proxies requests to an AI API. Add rate limiting (max 10 requests per minute per IP), cost logging (track token counts), and exponential backoff retry logic. Deploy it to Supabase or Vercel.