Computer Use API
Your AI just got a mouse, a keyboard, and a pair of eyes.
Claude's computer use tool lets AI take screenshots, click at specific coordinates, type text, and scroll -- all through a structured API. This lesson gets you from zero to your first working computer-use session.
What you'll learn
- How Claude's computer use tool works under the hood
- The coordinate system: how the AI maps pixels to actions
- Setting up your first computer-use session with the Anthropic API
- The screenshot-action loop in practice
The Computer Use Tool
Claude's computer use is a special tool -- like web search or code execution, but for interacting with a graphical interface. When you enable it, Claude gains four capabilities:
Screenshot. Capture the current state of the screen as an image. This is the AI's vision -- it sees exactly what is displayed, pixel by pixel.
Click. Move the cursor to specific x,y coordinates and click (left, right, double, or middle click). This is how the AI presses buttons, selects options, and interacts with elements.
Type. Send keystrokes to the active element. This is how the AI fills in forms, enters search queries, and writes text into any input field.
Scroll. Scroll up or down on the page. This is how the AI reaches content below the fold, navigates long pages, and reveals hidden elements.
The Coordinate System
When the AI sees a screenshot, it needs to know where things are so it can click accurately. The coordinate system is straightforward:
Origin (0,0) is the top-left corner of the screen. X increases going right. Y increases going down. If your screen is 1920x1080 pixels, the bottom-right corner is (1920, 1080).
Screen resolution matters. A button might be at (500, 300) on a 1920x1080 display but at (250, 150) on a 960x540 display. Always know your resolution and communicate it to the AI so coordinates are accurate.
Center of the element. When clicking a button, aim for the center, not the edge. A 200x50 pixel button at position (400, 300) should be clicked at approximately (500, 325) -- the center point. This gives the most reliable hits.
Your First Computer Use Session
Here is the minimal code to start a computer-use session with Claude. Every line is commented so you understand exactly what is happening:
Setting Up the API Call
The computer use tool is passed as part of the tools array in your API request. Here is the structure:
// Import the Anthropic SDK
import Anthropic from '@anthropic-ai/sdk';
// Create the client with your API key
const client = new Anthropic();
// Send a message with computer use enabled
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514', // Model with vision capabilities
max_tokens: 4096, // Room for the AI to think and act
tools: [{
type: 'computer_20250124', // The computer use tool type
name: 'computer', // Tool name
display_width_px: 1920, // Your screen width in pixels
display_height_px: 1080, // Your screen height in pixels
display_number: 0 // Which display (0 = primary)
}],
messages: [{
role: 'user',
content: 'Take a screenshot and tell me what you see.'
}]
});When Claude responds, it will request a tool use action -- either taking a screenshot first, or if you provide one, immediately suggesting a click/type/scroll action. You then execute that action on the actual screen and send back the result.
This lesson is for Pro members
Unlock all 355+ lessons across 36 courses with Academy Pro. Founding members get 90% off — forever.
Already a member? Sign in to access your lessons.