MCP + Computer Use
Programmatic tools meet visual agents. The best automation uses both.
MCP (Model Context Protocol) gives AI structured, programmatic access to tools and data. Computer use gives AI visual access to any screen. Combining them creates hybrid workflows that are faster than vision-only and more flexible than API-only.
What you'll learn
- How MCP servers provide structured tool access alongside computer use
- The decision framework: when to use MCP tools vs. computer use
- Building hybrid workflows that combine both approaches
- Architecture patterns for MCP + computer use agents
Two Superpowers, One Agent
Think of MCP and computer use as two different hands your agent can use. MCP is the precise hand -- it reaches directly into databases, APIs, and file systems with structured access. Computer use is the flexible hand -- it interacts with any visual interface, regardless of whether an API exists.
An agent with only MCP is limited to software that exposes structured APIs. An agent with only computer use is slow and expensive for tasks that could be done programmatically. An agent with both can choose the right approach for each step of a workflow.
This is not theoretical. Real production agents use MCP to read databases, send emails, and manage files -- then switch to computer use to navigate a government portal, fill an insurance form, or interact with legacy software that has no API.
The Hybrid Agent Architecture
┌─────────────────┐
│ Claude Agent │
│ (orchestrator) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────────▼─────┐ ┌────▼────┐ ┌──────▼───────┐
│ MCP Servers │ │ Computer│ │ Brain │
│ (structured) │ │ Use │ │ (memory) │
│ │ │(visual) │ │ │
│ - Database │ │ - Click │ │ - State │
│ - Email │ │ - Type │ │ - History │
│ - Files │ │ - Scroll│ │ - Directives │
│ - Calendar │ │ - See │ │ - Context │
└──────────────┘ └─────────┘ └──────────────┘
Fast, precise, Slow, flexible, Persistent
API-dependent universal across sessionsThe agent orchestrates all three. It reads the brain for context and directives. It uses MCP tools when structured access is available. It falls back to computer use when it needs to interact with a visual interface. The choice is made per-action, not per-workflow.
The Decision Framework
For each step in a workflow, the agent decides which approach to use:
Use MCP when: An MCP server exists for the target system. The action is data-oriented (read, write, query, send). Speed matters. The API is stable and well-documented. Examples: reading email via Gmail MCP, querying a database via Supabase MCP, creating a calendar event via Google Calendar MCP.
Use computer use when: No API or MCP server exists. The task requires visual interaction (navigating a dashboard, filling a web form). The interface is non-standard or frequently changes. The agent needs to verify visual output. Examples: filing a government form, navigating a legacy internal tool, verifying a website looks correct.
Use both when: Part of the workflow has API access and part does not. MCP provides context that computer use needs. Computer use verifies what MCP reported. Example: use Gmail MCP to read an email with a link, then use computer use to navigate to that link and fill the form it leads to.
This lesson is for Pro members
Unlock all 355+ lessons across 36 courses with Academy Pro. Founding members get 90% off — forever.
Already a member? Sign in to access your lessons.