Managing AI API Keys, Rate Limits & Costs

Every AI API call costs money. Every leaked key is a disaster. Every rate limit hit is a broken user experience. API management isn't glamorous — it's essential.

What you'll learn

How to store and rotate API keys securely
Rate limiting strategies that protect your budget and your users
Cost tracking and alerting before bills spiral
Building resilient API calls with retries and fallbacks

Security First

API Key Management

An exposed OpenAI or Anthropic API key can rack up thousands of dollars in minutes. This isn't theoretical — it happens constantly. Keys get committed to GitHub, hardcoded in frontend JavaScript, or left in plain text configs.

Rule 1: API keys live in environment variables, never in code. Every platform (Vercel, Supabase, AWS) has a secure way to store env vars. Use it.

Rule 2: AI API calls happen server-side only. Never call OpenAI or Anthropic from the browser. Your edge function or backend makes the call; the frontend gets the result.

Rule 3: Rotate keys regularly and set spending caps. Both OpenAI and Anthropic let you set monthly limits. Set them lower than you think you need.

Flow Control

Rate Limiting

AI APIs have their own rate limits — requests per minute, tokens per minute, concurrent requests. But you also need your own rate limits to protect your budget from abusive or runaway usage.

Per-user limits prevent any single user from burning through your API budget. A reasonable starting point: 20 requests per hour for free users, 100 for paid users.

Global limits act as a circuit breaker. If your total API spend hits a threshold, slow down or pause non-critical calls rather than letting costs run away.

Implement rate limiting at the edge — before the request hits your AI provider. Supabase edge functions or Vercel middleware are ideal places for this logic.

Money Matters

Cost Tracking and Alerting

Log every AI API call with its token count and estimated cost. This data is gold — it tells you which features are expensive, which users are heavy consumers, and whether your pricing model actually works.

Set up alerts at 50%, 75%, and 90% of your monthly budget. The first alert is informational. The second triggers investigation. The third should activate automatic throttling.

Track cost per feature, not just total cost. Your chatbot might cost $0.02 per conversation while your document analysis feature costs $0.50. This insight drives product and pricing decisions.

API Management — Match Each Rule to Its Purpose

Tap one on the left, then its match on the right

Resilience

Retries, Fallbacks, and Graceful Degradation

AI APIs go down. They time out. They return errors. Your app needs to handle all of this gracefully.

Exponential backoff: When a call fails, wait 1 second, then 2, then 4. Don't hammer a struggling API — you'll make it worse and get rate-limited.

Provider fallback: If Claude is down, can you fall back to GPT? If your primary vector DB is slow, do you have a cache layer? Multi-provider setups are more work but dramatically more reliable.

Graceful degradation: If all AI providers are down, your app should still function — just without AI features. Show cached responses, display a status message, or queue requests for later processing.

API Call Checklist

Before every AI API integration, verify: key stored in env var, server-side only, rate limited per user, cost logged, retry logic implemented, spending cap set, fallback defined.

Try it yourself

Build a simple edge function that proxies requests to an AI API. Add rate limiting (max 10 requests per minute per IP), cost logging (track token counts), and exponential backoff retry logic. Deploy it to Supabase or Vercel.

API Management — Console

✦Free response

Design API rate limiting for an AI app with three pricing tiers. For each tier, specify: request limits, what happens when limits are hit, and how you communicate limits to users via HTTP headers.

▸

Type a prompt below to get started.

Try: