视觉模式
🌐 Vision Mode
默认情况下,Playwright MCP 对所有交互使用可访问性快照。视觉模式增加了基于坐标的工具,这些工具可以与截图一起使用,从而能够与在可访问性树中未公开的元素进行交互。
🌐 By default, Playwright MCP uses accessibility snapshots for all interactions. Vision mode adds coordinate-based tools that work with screenshots, enabling interaction with elements not exposed in the accessibility tree.
启用视觉模式
🌐 Enabling vision mode
添加 vision 功能:
🌐 Add the vision capability:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--caps=vision"]
}
}
}
附加工具
🌐 Additional tools
在视觉模式下,这些基于坐标的工具可用:
🌐 With vision mode, these coordinate-based tools become available:
| 工具 | 描述 |
|---|---|
browser_mouse_move_xy | 将鼠标移动到 x,y 坐标 |
browser_mouse_click_xy | 在 x,y 点击(支持按钮、双击、延迟) |
browser_mouse_drag_xy | 从起始坐标拖动到结束坐标 |
browser_mouse_down | 按下鼠标按钮 |
browser_mouse_up | 释放鼠标按钮 |
browser_mouse_wheel | 使用鼠标滚轮滚动 |
工作流:与画布应用互动
🌐 Workflow: interacting with a canvas app
You: Draw a rectangle on the canvas.
→ browser_take_screenshot
// LLM sees the canvas and identifies coordinates
→ browser_mouse_click_xy { x: 100, y: 150 }
→ browser_mouse_drag_xy { startX: 100, startY: 150, endX: 300, endY: 250 }
→ browser_take_screenshot
// LLM verifies the rectangle was drawn
工作流程:点击没有可访问名称的图标
🌐 Workflow: clicking an icon without accessible name
→ browser_snapshot
// The gear icon has no accessible name in the snapshot
→ browser_take_screenshot
// LLM sees the gear icon at approximately (850, 45)
→ browser_mouse_click_xy { x: 850, y: 45 }
→ browser_snapshot
// Settings panel is now open with proper accessibility
- heading "Settings" [level=2]
- textbox "Display name" [ref=e12]
何时使用视觉模式
🌐 When to use vision mode
| 场景 | 方法 |
|---|---|
| 标准网页 | 使用快照中的引用(默认) |
| Canvas / WebGL 应用 | 使用带截图的视觉模式 |
| 地图交互 | 使用视觉模式进行平移/缩放 |
| 图片编辑器 | 使用视觉模式进行绘图 |
| 图表 / 图形 | 使用视觉模式点击数据点 |
| 无 ARIA 的自定义控件 | 使用视觉模式作为备用 |
对于大多数网络应用,基于快照的默认方法更可靠且令牌效率更高。只有在辅助功能树无法覆盖你的使用场景时,才使用视觉模式。
🌐 For most web applications, the default snapshot-based approach is more reliable and token-efficient. Use vision mode only when the accessibility tree doesn't cover your use case.
结合能力
🌐 Combining capabilities
启用多种功能:
🌐 Enable multiple capabilities:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--caps=vision,pdf,devtools"]
}
}
}
或者在配置文件中:
🌐 Or in the config file:
{
"capabilities": ["core", "vision", "pdf", "devtools"]
}