Browser Harness: The Most Token-Efficient Browser Automation

There are five ways to automate a browser with AI. Pick wrong and you’re not just slow — you’re burning money.

Five Approaches Compared

Approach	Tokens/call	AI-friendliness	Flexibility	Keeps login state
Claude in Chrome	5-8K	★★★	Medium	Yes
Computer Use	15-20K	★★	Highest	No
Selenium/Playwright	N/A	★	Highest	No
Playwright MCP	3-5K	★★★★	Medium	No
Browser Harness	2-2.5K	★★★★★	High	Yes

Browser Harness uses 1/8th the tokens of Computer Use. And it connects directly to the Chrome you’re already using — all your cookies, login sessions, and extensions are right there.

Architecture

Claude Code → CLI → Daemon process → CDP WebSocket → Your Chrome

The entire thing is 4 files, under 900 lines of Python. The protocol is Chrome DevTools Protocol (CDP), connected over WebSocket.

Here’s the fundamental difference from Selenium: Selenium pre-defines hundreds of tools for the AI to pick from. The AI frequently picks wrong. Browser Harness gives the AI one js() function (execute arbitrary JavaScript) plus screenshot verification. The AI writes whatever logic it needs. Minimal framework, maximum AI autonomy.

The Self-Healing System

agent_helpers.py: Starts as an empty file. When the agent runs a task, it writes custom helper functions (needs scroll_to_bottom()? Writes it. Next task imports it directly). Capabilities accumulate over time.

domain-skills/ directory: 30+ site-specific playbooks (TikTok, GitHub, Polymarket, and others). The key detail: humans didn’t write these. The agent generated them at runtime. They record DOM selectors, interaction sequences, and edge cases.

The positive feedback loop: First run explores the site, finds patterns, succeeds, records findings to domain-skills. Second run reads domain-skills, executes faster, hits higher confidence. Third run has 90% coverage of common scenarios — near-zero trial and error.

Core API

new_tab("https://example.com")    # Opens a new tab (doesn't override user's page)
wait_for_load()                    # Waits for navigation to complete
info = page_info()                 # Returns {url, title, ...}
result = js("document.title")     # Execute any JS, get return value
press_key("Enter")                 # Keyboard input
capture_screenshot("/path")        # Screenshot for verification

All interactions go through JS:

js('document.querySelector("input").focus()')
js('document.execCommand("insertText", false, "your text here")')
js('document.querySelector("button").click()')

The minimal API is a deliberate design choice. The agent has a JS executor. Anything the browser can do, the agent can do. The framework doesn’t need to pre-define operations.

Real Example: Gemini Image Generation

Using Browser Harness to control Chrome and generate images through Gemini:

new_tab("https://gemini.google.com/app") — open Gemini
JS locates the editor (Quill’s .ql-editor or a contenteditable element), uses document.execCommand("insertText") to type the prompt
Press Enter to send, verify URL change (contains conversation ID = send confirmed)
wait(30) for image generation, screenshot to confirm it’s not a loading spinner
JS finds the download button, clicks it, moves the file to the project directory

Token cost: roughly 2,000. The same task with Computer Use: 15,000+ (multiple rounds of screenshot-then-interpret). An 8x gap.

Installation

# 1. Install
uv tool install browser-harness
# or: pip install browser-harness

# 2. Enable Chrome remote debugging
# chrome://inspect/#remote-debugging → check the checkbox

# 3. Verify
browser-harness --version

When to Use What

First question: Is there a dedicated MCP? GitHub operations → GitHub MCP. Notion → Notion MCP. Slack → Slack MCP. API-backed MCPs are fastest and most reliable.

Second question: No dedicated MCP? Open the browser.

Need	Best choice
Stable, structured operations	Playwright MCP (40+ predefined tools)
Token efficiency + flexibility	Browser Harness
Pixel-level control, desktop apps	Computer Use

For roughly 70% of non-specialized web tasks, Browser Harness is the strongest option.

Rules to Live By

Never use hardcoded coordinates — pixel positions shift across resolutions and screen sizes
Verify URL changes after submissions — proof that the action went through
Screenshot after visual operations — confirm images or content actually rendered
Validate step by step — don’t skip confirmation checkpoints

Break any of these four rules and your automation reliability drops off a cliff. They’re non-negotiable.