📚Academy
likeone
online

Testing Visual Agents

You cannot trust what you cannot see. Record, replay, and validate every agent run.

Visual agents interact with unpredictable environments. Testing them requires different tools than testing code. This lesson covers recording, replaying, validating, and building audit trails for visual agent workflows.

What you'll learn

  • How to record visual agent runs for debugging and compliance
  • GIF capture for human-reviewable audit trails
  • Replay testing: re-running workflows against recorded states
  • Regression testing strategies for visual agents

Why Visual Testing Is Different

Code tests are deterministic. Given the same input, you get the same output. Visual agent tests are not. The same webpage can look different at different times -- ads change, layouts shift, content updates, sessions expire. A test that passed yesterday can fail today because a banner appeared or a button moved.

This means visual agent testing needs different strategies: recording every run for post-hoc review, building visual assertions that tolerate minor changes, and creating regression suites that catch real failures while ignoring cosmetic variations.

The Recording Pipeline

Every visual agent run should produce a recording -- a complete log of what the agent saw and did. This serves three purposes:

Debugging. When a workflow fails, the recording shows exactly what happened. What did the agent see? What did it click? Where did it go wrong? Without a recording, you are guessing.

Compliance. For regulated industries (finance, healthcare, government), you may need to prove that an automated process followed the correct steps. A visual recording is an audit trail that anyone can review.

Improvement. Recordings are training data. Review them to find patterns: where does the agent hesitate? Where does it make wrong decisions? Where does it waste time? Use these insights to improve the agent's prompts, strategies, and error handling.

// Recording pipeline structure { "run_id": "run_2026-04-29_001", "task": "Process refund for order #12345", "started_at": "2026-04-29T10:30:00Z", "steps": [ { "step": 1, "action": "screenshot", "screenshot": "screenshots/step_001.png", "analysis": "Login page visible. Email and password fields present.", "timestamp": "2026-04-29T10:30:01Z" }, { "step": 2, "action": "type", "target": "email field", "value": "admin@company.com", "screenshot_after": "screenshots/step_002.png", "verification": "Email field shows admin@company.com", "timestamp": "2026-04-29T10:30:04Z" } ], "result": "success", "completed_at": "2026-04-29T10:32:15Z" }

GIF Capture for Audit Trails

A JSON log tells you what happened. A GIF shows you. Capturing every screenshot in sequence and compiling them into an animated GIF creates a human-reviewable movie of the entire agent run. In 10 seconds of viewing, a human can see exactly what the agent did across a 50-step workflow.

Capture every screenshot. Save every screenshot taken during the run to a timestamped directory. Name them sequentially: 001.png, 002.png, 003.png.

Annotate key frames. For screenshots where the agent took an action, overlay a visual marker: a red circle at the click coordinates, a highlight on the typed text, an arrow showing the scroll direction. This makes the GIF self-documenting.

Compile to GIF. Use tools like ffmpeg or imagemagick to combine the annotated screenshots into an animated GIF. Set frame duration to 1-2 seconds for comfortable viewing. The result is a compact, shareable visual record.

# Create a GIF from screenshot sequence using ffmpeg ffmpeg -framerate 1 -i screenshots/step_%03d.png \ -vf "scale=960:-1" -loop 0 audit_trail.gif # Or with imagemagick (simpler but larger files) convert -delay 100 -loop 0 screenshots/step_*.png audit_trail.gif
🔒

This lesson is for Pro members

Unlock all 355+ lessons across 36 courses with Academy Pro. Founding members get 90% off — forever.

Already a member? Sign in to access your lessons.

Academy
Built with soul — likeone.ai