Secure by Default
Everything you have learned in this course — injection, jailbreaking, exfiltration, guardrails, monitoring — comes together here. Security-first architecture means designing your AI system so that the safe path is the easy path. Security is not bolted on after development. It is the foundation you build on.
The three principles of security-first AI architecture are: least privilege (minimize what the agent can do), defense in depth (multiple layers), and fail-safe defaults (when something goes wrong, the system locks down rather than opens up).
Principle 1: Least Privilege
Every agent, tool, and connection should have the minimum access needed to do its job. Nothing more.
BAD: One agent with all permissions
Agent → Full database access
→ All API keys
→ Read/write entire filesystem
→ Send emails to anyone
→ Run any Bash command
GOOD: Scoped agents with minimum access
Support Agent → get_order_status(id) only
→ Read FAQ documents only
→ Cannot access user PII
→ Cannot send external emails
→ No Bash access
Principle 2: Defense in Depth
Layer your defenses so no single failure compromises the system:
User Input
|
[Layer 1: Input Validation]
| Pattern matching, classifier, length limits
|
[Layer 2: Prompt Boundary]
| User input framed as data, not instructions
|
[Layer 3: Hardened System Prompt]
| Reinforced rules, explicit refusal instructions
|
[Layer 4: Model Processing]
| Claude processes with built-in safety training
|
[Layer 5: Output Validation]
| PII scan, prompt leak detection, content policy
|
[Layer 6: Tool Permissions]
| Scoped tools, PreToolUse hooks, rate limits
|
[Layer 7: Monitoring]
| Anomaly detection, abuse signals, audit logging
|
Response to User