视觉模式
🌐 Vision Mode
默认情况下,你使用可访问性快照中的 refs 与页面元素进行交互。对于未在可访问性树中显示的元素——画布应用、地图、自定义控件——请使用基于坐标的鼠标命令,并以屏幕截图作为视觉参考。
🌐 By default, you interact with page elements using refs from accessibility snapshots. For elements not exposed in the accessibility tree — canvas apps, maps, custom widgets — use coordinate-based mouse commands with screenshots as your visual reference.
命令
🌐 Commands
| 命令 | 描述 |
|---|---|
mousemove <x> <y> | 将鼠标移动到像素坐标 |
mousedown [button] | 按下鼠标按钮(左、中、右) |
mouseup [button] | 释放鼠标按钮 |
mousewheel <dx> <dy> | 滚动(dx=水平, dy=垂直) |
screenshot | 捕捉视口以参考坐标 |
工作流:与画布应用互动
🌐 Workflow: interacting with a canvas app
# Take a screenshot to see the canvas
playwright-cli screenshot --filename=canvas.png
# Agent identifies coordinates from the screenshot
# Click at position (150, 300)
playwright-cli mousemove 150 300
playwright-cli mousedown
playwright-cli mouseup
# Drag from (100, 200) to (400, 200)
playwright-cli mousemove 100 200
playwright-cli mousedown
playwright-cli mousemove 400 200
playwright-cli mouseup
# Verify the result
playwright-cli screenshot --filename=after-drag.png
工作流程:点击没有可访问名称的图标
🌐 Workflow: clicking an icon without accessible name
# Snapshot doesn't show the gear icon
playwright-cli snapshot
# (no gear icon in output)
# Take a screenshot — agent sees gear icon at approximately (850, 45)
playwright-cli screenshot
# Click it
playwright-cli mousemove 850 45
playwright-cli mousedown
playwright-cli mouseup
# Settings panel opens with proper accessibility
playwright-cli snapshot
# - heading "Settings" [level=2]
# - textbox "Display name" [ref=e12]
# Now use refs for the rest
playwright-cli fill e12 "New Name"
何时使用视觉模式
🌐 When to use vision mode
| 场景 | 方法 |
|---|---|
| 标准网页 | 使用快照中的引用(默认) |
| Canvas / WebGL 应用 | 使用带截图的视觉模式 |
| 地图交互 | 使用视觉模式进行平移/缩放 |
| 图片编辑器 | 使用视觉模式进行绘图 |
| 图表 / 图形 | 使用视觉模式点击数据点 |
| 无 ARIA 的自定义控件 | 使用视觉模式作为备用 |
对于大多数网络应用,基于快照的默认方法更可靠且令牌效率更高。只有在辅助功能树无法覆盖你的使用场景时,才使用视觉模式。
🌐 For most web applications, the default snapshot-based approach is more reliable and token-efficient. Use vision mode only when the accessibility tree doesn't cover your use case.