Click, Type & Navigate
From seeing the screen to controlling it -- reliable interaction patterns that work.
Screenshots tell the AI what is on screen. Now it needs to act: click buttons, fill forms, navigate menus, and handle dropdowns. This lesson builds the interaction patterns that make vision agents reliable.
What you'll learn
- How to build reliable click targeting that hits the right element
- Form filling patterns: text inputs, dropdowns, checkboxes, radio buttons
- Menu navigation: expanding menus, selecting nested options
- Keyboard shortcuts and key combinations for faster navigation
The Interaction Problem
Clicking a button sounds simple. But when an AI does it, there are failure modes a human never encounters. The button might be partially off-screen. The page might still be loading. A popup might appear between the screenshot and the click. The element might be behind a cookie consent banner.
Reliable computer use requires defensive interaction patterns. Not just "click the button" but "verify the button exists, verify it is visible, verify nothing is blocking it, click it, verify the click worked." This lesson teaches those patterns.
Click Patterns
Single click. The most common action. Left-click at coordinates (x, y). Used for buttons, links, menu items, checkboxes, radio buttons. Always aim for the center of the element.
Double click. Used for selecting text, opening files in file managers, or activating edit mode in some interfaces. Less common in web automation but essential for desktop applications.
Right click. Opens context menus. Useful for accessing options not available through the main UI -- "Open in new tab," "Save image as," "Inspect element."
Click and verify. The most important pattern. After every click, take a screenshot and verify the expected change occurred. Did the button change state? Did a new page load? Did a modal appear? If the expected change did not happen, the click may have missed or a loading delay prevented it.
// The click-and-verify pattern
async function reliableClick(x, y, expectedChange) {
// 1. Click at the coordinates
await computerTool.click(x, y);
// 2. Wait a moment for the UI to respond
await wait(500); // 500ms is usually enough
// 3. Take a screenshot to verify
const screenshot = await computerTool.screenshot();
// 4. Ask Claude to verify the expected change
const verification = await claude.analyze(screenshot,
`Did the following change occur? ${expectedChange}`
);
// 5. If not, retry or report failure
if (!verification.success) {
console.log('Click may have missed. Retrying...');
await reliableClick(x, y, expectedChange);
}
}Form Filling Patterns
Forms are the bread and butter of computer use automation. Here is how to handle each form element reliably:
Text inputs. Click the field to focus it. Verify the cursor is blinking inside. Then type. After typing, take a screenshot to verify the text appeared correctly. Watch out for auto-complete dropdowns that may cover other elements.
Dropdowns. Click the dropdown to open it. Take a screenshot to see the options. Find the desired option and click it. Verify the dropdown closed and shows the selected value. Some dropdowns are searchable -- type the option name after opening to filter.
Checkboxes. Click the checkbox or its label text. Take a screenshot to verify the check mark appeared. Some checkboxes use custom styling that looks different from standard checkboxes -- the AI needs to verify the visual change, not assume a standard appearance.
Date pickers. Often the trickiest form element. Click to open the calendar. Navigate months if needed (click left/right arrows). Click the specific date. Verify the date field shows the correct value. Alternative: if the input accepts typed dates (MM/DD/YYYY), type directly instead of using the picker -- it is faster and more reliable.
This lesson is for Pro members
Unlock all 355+ lessons across 36 courses with Academy Pro. Founding members get 90% off — forever.
Already a member? Sign in to access your lessons.