Production Patterns
From prototype to production: headless browsers, cloud execution, cost management, and security.
A vision agent that works on your laptop is a demo. A vision agent that runs reliably in production, handles thousands of tasks, stays within budget, and operates within security boundaries -- that is a product. This lesson teaches the patterns that make it real.
What you'll learn
- Headless browser execution: running agents without a visible display
- Cloud deployment: spinning up visual agents on demand
- Cost management: reducing API spend per task by 80%+
- Security boundaries: containing visual agents safely
Headless Browser Execution
In production, you do not want a visible browser window. Headless browsers run without a display, executing everything in memory. The screenshots are still captured -- they are just never shown on a physical screen.
Xvfb (X Virtual Frame Buffer). Creates a virtual display in memory. Applications think they have a screen, but the pixels are only in RAM. This is the standard approach for Linux servers. Run your browser against Xvfb and capture screenshots from the virtual display.
Playwright headless mode. Playwright can run Chromium in headless mode -- no virtual display needed. Screenshots are captured directly from the browser's rendering engine. Simpler setup, but some websites detect headless browsers and block them.
Docker containers. Package your agent, browser, and virtual display into a Docker container. Each task gets a fresh container with a clean browser profile. Containers are isolated, reproducible, and disposable -- perfect for production agents.
# Dockerfile for a production vision agent
FROM node:20-slim
# Install Chrome and virtual display
RUN apt-get update && apt-get install -y \
chromium xvfb fonts-liberation \
&& rm -rf /var/lib/apt/lists/*
# Set up virtual display
ENV DISPLAY=:99
RUN Xvfb :99 -screen 0 1920x1080x24 &
# Install agent dependencies
COPY package.json ./
RUN npm install
COPY . .
# Run the agent
CMD ["node", "agent.js"]Cloud Deployment Patterns
Running visual agents in the cloud enables scale and reliability. Here are the production patterns:
On-demand containers. Spin up a container for each task, run the workflow, save the results, destroy the container. No persistent state on the machine. Clean environment every time. Services like AWS Fargate, Google Cloud Run, or Azure Container Instances support this pattern.
Queue-based execution. Tasks go into a queue (SQS, Cloud Tasks, Redis queue). Worker containers pull tasks, execute them, report results. If a worker crashes, the task goes back in the queue for another worker. This pattern handles load spikes and failures gracefully.
Scheduled execution. For recurring tasks (daily reports, weekly form submissions), use cron jobs or cloud schedulers. Trigger the container at the scheduled time, run the workflow, store results. No always-running infrastructure -- you pay only for execution time.
Result storage. Save screenshots, GIF recordings, and JSON logs to cloud storage (S3, GCS, R2). Tag with the task ID, timestamp, and outcome. This creates a searchable archive for debugging and compliance.
Cost Management
Computer use costs compound fast in production. Each screenshot sent to Claude consumes image tokens. A 10-step workflow might cost $0.10-0.50. Run 1,000 workflows per day and you are spending $100-500 daily on API calls alone. Here is how to cut that by 80%+:
Minimize screenshots. Do not screenshot after every tiny action. Group related actions: type the email, Tab to password, type the password, THEN screenshot to verify both fields. One screenshot instead of three.
Reduce resolution. For pages where you only need to read large text and find big buttons, 1280x720 or even 960x540 is sufficient. Lower resolution = fewer image tokens = lower cost. Use 1920x1080 only when you need to read small text or dense UI.
Use DOM when possible. The hybrid approach from Lesson 7 is also a cost strategy. Every step done via DOM instead of vision saves a screenshot round-trip. If the login page always has the same selectors, use DOM for login and save vision for the unpredictable parts.
Cache page analysis. If the same page is visited repeatedly (a login page, a dashboard), cache the element locations from the first analysis. Subsequent visits skip the "analyze screenshot" step and go directly to action using cached coordinates. Invalidate the cache if the page changes.
Use smaller models for simple tasks. Not every screenshot needs the most powerful model. Simple tasks (is this a login page? where is the Submit button?) can use cheaper, faster models. Reserve the expensive model for complex decisions.
Security Boundaries
A visual agent with browser access can potentially reach any website, fill any form, and interact with any service. Security boundaries prevent accidents and contain damage:
URL allowlisting. The agent can only navigate to pre-approved domains. If it tries to navigate elsewhere, the action is blocked. This prevents the agent from being tricked into visiting malicious sites or interacting with unintended services.
Credential isolation. Store credentials in a secure vault (environment variables, secret manager), not in the agent's prompt or code. The agent receives credentials only when needed and only for the current task. Credentials are never written to logs or screenshots.
Action logging. Log every action the agent takes -- every URL visited, every field filled, every button clicked. These logs are your security audit trail. Review them regularly for unexpected behavior.
Network isolation. Run the agent container in a network that can only reach the allowed domains. Block all other outbound traffic. Even if the agent tries to navigate somewhere unexpected, the network firewall prevents it.
Session management. Each task gets a fresh browser profile. No cookies, no saved passwords, no history carry over from previous tasks. This prevents credential leakage between tasks and ensures each run starts clean.
This lesson is for Pro members
Unlock all 355+ lessons across 36 courses with Academy Pro. Founding members get 90% off — forever.
Already a member? Sign in to access your lessons.