Browser Harness: The Most Token-Efficient Browser Automation
4 files, under 900 lines of code. CDP-connected Chrome automation that uses 8x fewer tokens than Computer Use — with full access to your login sessions.
There are five ways to automate a browser with AI. Pick wrong and you’re not just slow — you’re burning money.
Five Approaches Compared
| Approach | Tokens/call | AI-friendliness | Flexibility | Keeps login state |
|---|---|---|---|---|
| Claude in Chrome | 5-8K | ★★★ | Medium | Yes |
| Computer Use | 15-20K | ★★ | Highest | No |
| Selenium/Playwright | N/A | ★ | Highest | No |
| Playwright MCP | 3-5K | ★★★★ | Medium | No |
| Browser Harness | 2-2.5K | ★★★★★ | High | Yes |
Browser Harness uses 1/8th the tokens of Computer Use. And it connects directly to the Chrome you’re already using — all your cookies, login sessions, and extensions are right there.
Architecture
Claude Code → CLI → Daemon process → CDP WebSocket → Your Chrome
The entire thing is 4 files, under 900 lines of Python. The protocol is Chrome DevTools Protocol (CDP), connected over WebSocket.
Here’s the fundamental difference from Selenium: Selenium pre-defines hundreds of tools for the AI to pick from. The AI frequently picks wrong. Browser Harness gives the AI one js() function (execute arbitrary JavaScript) plus screenshot verification. The AI writes whatever logic it needs. Minimal framework, maximum AI autonomy.
The Self-Healing System
agent_helpers.py: Starts as an empty file. When the agent runs a task, it writes custom helper functions (needs scroll_to_bottom()? Writes it. Next task imports it directly). Capabilities accumulate over time.
domain-skills/ directory: 30+ site-specific playbooks (TikTok, GitHub, Polymarket, and others). The key detail: humans didn’t write these. The agent generated them at runtime. They record DOM selectors, interaction sequences, and edge cases.
The positive feedback loop: First run explores the site, finds patterns, succeeds, records findings to domain-skills. Second run reads domain-skills, executes faster, hits higher confidence. Third run has 90% coverage of common scenarios — near-zero trial and error.
Core API
new_tab("https://example.com") # Opens a new tab (doesn't override user's page)
wait_for_load() # Waits for navigation to complete
info = page_info() # Returns {url, title, ...}
result = js("document.title") # Execute any JS, get return value
press_key("Enter") # Keyboard input
capture_screenshot("/path") # Screenshot for verification
All interactions go through JS:
js('document.querySelector("input").focus()')
js('document.execCommand("insertText", false, "your text here")')
js('document.querySelector("button").click()')
The minimal API is a deliberate design choice. The agent has a JS executor. Anything the browser can do, the agent can do. The framework doesn’t need to pre-define operations.
Real Example: Gemini Image Generation
Using Browser Harness to control Chrome and generate images through Gemini:
new_tab("https://gemini.google.com/app")— open Gemini- JS locates the editor (Quill’s
.ql-editoror a contenteditable element), usesdocument.execCommand("insertText")to type the prompt - Press Enter to send, verify URL change (contains conversation ID = send confirmed)
wait(30)for image generation, screenshot to confirm it’s not a loading spinner- JS finds the download button, clicks it, moves the file to the project directory
Token cost: roughly 2,000. The same task with Computer Use: 15,000+ (multiple rounds of screenshot-then-interpret). An 8x gap.
Installation
# 1. Install
uv tool install browser-harness
# or: pip install browser-harness
# 2. Enable Chrome remote debugging
# chrome://inspect/#remote-debugging → check the checkbox
# 3. Verify
browser-harness --version
When to Use What
First question: Is there a dedicated MCP? GitHub operations → GitHub MCP. Notion → Notion MCP. Slack → Slack MCP. API-backed MCPs are fastest and most reliable.
Second question: No dedicated MCP? Open the browser.
| Need | Best choice |
|---|---|
| Stable, structured operations | Playwright MCP (40+ predefined tools) |
| Token efficiency + flexibility | Browser Harness |
| Pixel-level control, desktop apps | Computer Use |
For roughly 70% of non-specialized web tasks, Browser Harness is the strongest option.
Rules to Live By
- Never use hardcoded coordinates — pixel positions shift across resolutions and screen sizes
- Verify URL changes after submissions — proof that the action went through
- Screenshot after visual operations — confirm images or content actually rendered
- Validate step by step — don’t skip confirmation checkpoints
Break any of these four rules and your automation reliability drops off a cliff. They’re non-negotiable.