Closer to the Metal: Leaving Playwright for CDP

Goodbye Playwright, Hello CDP

Playwright and Puppeteer are great for making QA tests and automation scripts short and readable, but as AI browser companies have been learning the hard way over the last year, sometimes these adapters obscure important details about the underlying browsers.

We decided to peek behind the curtain and figure out what the browser was really doing, and it made us decide to drop playwright entirely and just speak the browser's native tongue: CDP.

By switcing to raw CDP we've massively increased the speed of element extraction, screenshots, and all our default actions. We've also managed to add new async reaction capabilities to the agent, and proper cross-origin iframe support.

Obviously we ignored the (wise) advice in the header of the getting-started-with-cdp docs 👌

The Curse of Abstraction

Building AI browser automation is like building on top of a jenga tower of complexity. Every layer presents its own leaky abstractions, its own subtle crashes, and its own resource constraints.

If you've ever heavily depended on an adapter library and build up a large codebase around it, you know the feeling that eventually comes when you realize the adapter library is no longer saving you any time by "hiding the true complexity". In our case that time has finally come for Browser-Use and playwright-python, the library that we've historically used to drive our browsers with LLM-powered tool calls like click, input_text, go_to_url.

At first glance it may seem foolish to throw out such a mature adapter library and reinvent the wheel, but luckily the needs of AI browser agents are much narrower than the entire surface area that playwright provides, and we believe we can implement the calls we need with more specialized logic to better suit AI drivers.

Playwright also introduces a 2nd network hop going through a node.js playwright server websocket, which incurs a meaningful amount of latency when we do thousands of CDP calls to check for element position, opacity, paint order, JS event listeners, aria properties, etc.

📜 A Quick History of Browser Automation

To really understand why the browser automation is in the state it's in today, we have to look back at some history.

The Millenial Era

1992 – Lynx (text-mode browser) could browse and automate keystroke inputs from a script, still useful today!
1995–1997 – Netscape Navigator (Unix) exposed netscape -remote "openURL(http://…)") to control an already-running GUI browser
1997 – Internet Explorer (Windows) exposed a COM automation object (InternetExplorer.Application) so VB/VBA/WSH could launch, navigate, read/write the DOM, handle events, etc.
1998 – Mercury’s web-focused tool appears: Astra QuickTest (which evolved into QuickTest Professional/QTP, later HP UFT), also WinRunner/XRunner
1999–2002 – headless & macro tools: HttpUnit (1999) (HTTP/HTML-level, no real browser), iMacros (2001) (record/replay in the browser), HtmlUnit (2002) (headless Java “browser”)
2001–2003 – Watir (Ruby) grows out of driving IE via COM/OLE and starts offering more general APIs

Selenium Dominance Era

2004 – Selenium comes out
2009 - Selenium + WebDriver join forces
2011–2017 — Before headless Chrome, PhantomJS (a headless WebKit-based browser) filled the gap for scripting “like a browser,” with mixed reliability.

The Pre-Headless Era (the Dark Ages)

2011 — Chrome ships remote debugging; work happens “upstream” in WebKit so other ports can adopt it (post by Pavel Feldman).
2012 — WebKit Remote Debugging Protocol v1.0 announced; early docs/talks outline the domains/events model that CDP still uses.
2013–2014 — Blink forks from WebKit; the protocol solidifies on the Chromium side and becomes known as the Chrome DevTools Protocol (CDP). Extensions can tunnel it via chrome.debugger and --remote-debugging flag.
2014 — Chrome’s chrome.automation (accessibility/automation) extension API appears (exposes the accessibility tree; separate from CDP).

Headless Chrome & CDP Era

Apr 2017 — Headless Chrome announced; Puppeteer introduced as a Chrome team Node library to drive Chrome (headless/full) via CDP.
Jan 2018 — Puppeteer 1.0 ships.
Jun 2018 — WebDriver becomes a W3C Recommendation (cross-browser standard). ChromeDriver implements W3C WebDriver (and later BiDi) and is tightly coupled to Chrome releases.
2019 — Google I/O talk by Andrey Lushnikov & Joel Einbinder (DevTools/Puppeteer team) popularizes modern testing with Puppeteer.

Multi-Browser Standardization Era

2019–early 2020 — Several core Puppeteer engineers leave Google for Microsoft and start Playwright (cross-browser automation/test framework) 🎭 (oooo drama)
Jan 31, 2020 — Playwright 0.x public release.
May 6, 2020 — Playwright 1.0 ships.
Sep–Oct 2020 → — Multi-language support begins (e.g., Playwright for Python announced Sep 30, 2020).
2023 — ChromeDriver adds WebDriver BiDi support (alongside classic WebDriver).
2024 — Puppeteer adds WebDriver BiDi support; Selenium “welcomes Puppeteer to the WebDriver world.”

This wheel has been reinvented every few years it seems.

Modern Times: A Multitude of Choice

Now in 2025 we are lucky to have many high quality driver libraries to choose between, our favorites include:

⭐️ pydoll (best python-first playwright replacement)
⭐️ go-rod (best CDP reference implementation), chromedp (great CDP debug tooling)
puppeteer (best native chrome behavior), playwright (best cross-browser support)
selenium (and Selenium Grid) still a great mature option today, cypress (automate with old-school WebDrivers)
appium automate via system-level accessibility APIs on Android, iOS, macOS, Windows

So why did we feel the need to write our own with cdp-use? Well for all the same reasons as everyone else: everlasting desire to be closer to the metal and have more detailed control over every step.

How do Browser Drivers Work?

So what APIs does the browser actually expose anyway? What sits underneath all these "drivers"?

🔌 What are the automation APIs that Chromium actually exposes?

All these adapter libraries, drivers, and AI helper extensions really just exist to pass messages and make RPC calls to these underlying browser APIs:

Chrome Extension APIs
- chrome.tabs.captureVisibleTab()
- chrome.automation.getTree()
- chrome.scripting.executeScript()
- chrome.debugger.sendCommand({tabId: 123}, "Page.navigate", {url})
- ... and many more ...
  
  Chrome Extension APIs appear to be the most powerful at first glance because they encompass CDP with chrome.debugger, but raw CDP lets you access some calls that are not available through chrome.debugger, and allows parallel connections to multiple targets.
CDP APIs (via pure CDP Websocket or WebDriver BIDI socket)
- Page.navigate({url})
- Target.createTarget()
- DOMSnapshot.captureSnapshot()
- Page.handleJavaScriptDialog({accept: true})
- Browser.setDownloadBehavior()
- ... and many more ...
OS-Level Accessibility & screenreader APIs (NVDA/Voice Over/AppleScript/Appium/etc.)
- get a tree/rotor view of all elements shown to screenreaders (links, buttons, inputs, ...)
- script copy/paste, mouse, keypress, and element rotor/tab-based navigation
Internal Chromium C++ APIs (within Chromium source code)
- you can call arbitrary helpers content/browser/devtools/protocol/*_handler.cc
- you can edit the CDP spec and add commands to call custom C++ APIs third_party/blink/public/devtools_protocol/browser_protocol.pdl
- ... anything is possible when the call is coming from inside the house ...
Launch Flags, User Data Dir, and Preferences
- we've tested over 300+ chrome launch flags, launching is a whole world of complex behavior: https://peter.sh/experiments/chromium-command-line-switches/
- user data dir state can drastically affect browser behavior, there are preferences files, state dbs, cookie stores https://github.com/thewh1teagle/rookie and more
- Profile Preferences (e.g. theme color, downloads dir, extension display settings, etc.), enterprise / registry options, and chrome://flags options

Ignoring These for Now

Classic WebDriver W3C / ChromeDriver REST APIs
- /session/{id}/url
- /session/{id}/element/{eid}/click
- /session/{id}/actions
- ... etc. These are not actually exposed by any browser directly, rather they are the W3C standardized REST API shape recommended for drivers (ChromeDriver, GeckoDriver, WebKitDriver, selenium) to provide to clients above raw browser calls via CDP/BIDI.
Webdriver BIDI (websocket)
- merger of the old REST-API based WebDriver system + CDP in a single websocket, official release delayed for years now, not feature complete yet. check back in 2027

🎭 How does Playwright work?

Playwright achieves multi-languge support by using a client-server model between clients in various languages and a single core implementation that runs as a node.js websocket server.

The playwright node.js relay server accepts standardized "playwright protocol" RPC calls from playwright clients, and then sends out CDP or BIDI calls to the browser to execute them.

This API is elegant in some ways, the "playwright protocol" of commands provides a nicely typed RPC interface and standardizes behavior across languages. Playwright also nicely abstracts lower-level browser ideas like targets, frames, and sessions into simple Page and BrowserContext handles and (usually) manages to keep those handles in sync and not deadlocked across node.js, the browser, and python.

Unfortunately the double RPC through the node.js relay means some state inevitably drifts across the 3 places (and across three different languages and runtimes):

live browser
playwright node.js relay process
python client process

When a tab crashes in the browser or some operation is performed without focusing a page correctly, there are edge cases where the node.js process can hang indefinitely waiting for a browser reply, meanwhile the python client needs to send the CDP call the browser is expecting in order to proceed. Currently we have no recourse but to kill -9 and attempt to reconnect to the browser from scratch with a new playwright instance.

There are numerous cases like that only crop up in 1% of cases with specific slow network conditions, but edge cases can quickly drag down overall success scores when we run thousands of steps per eval.

🩸 Playwright's Sharp Edges

The playwright happy paths usually work fine, but the devil is in the details:

fullPage=True screenshot on pages longer than >16,000px high (reliably crashes playwright)
alert()/confirm()/onbeforeunload handling
attempting to keyboard/mouse/dialog input without focusing a page
file upload & download handling on remote browsers
about:*, chrome://*, chrome-error://, chrome-extension://, PDF tab handling
chrome preferences and enterprise/registry configuration management
crashed tab handling

DEBUG=pw:api helps but it only goes so far, at a certain point it doens't make sense to build workarounds around a relay layer that we're fighting to customize and control anyway.

Sometimes when you are forced to thoroughly stretch every nook and cranny of an adapter layer, you start to see the ugly truths of the underlying resource, and you no longer want the "pretty version" as a veil pulled over your eyes, you'd rather see the ugly truth.

🍳 Starting From Scratch: Out of the frying pan and into the fire

Delivering a reliable experience when so many of the underlying components are inherently unreliable (or actively adversarial) is a monumental engineering challenge.

Did you know there are at least 10 different ways a tab can crash in Chrome?

all targets start in a briefly semi-"crashed"/unresponsive state while initial requests are inflight, before the main page JS thread starts
chrome zygote/root process can crash (slow user_data_dir/filesystem io, oom, cpu lag, etc.)
GPU process can crash, there's even a helpful CDP call Browser.crashGpuProcess
page renderers can crash due to exceptions raised within chrome source (sigsev, oom, etc.)
page renderers can crash because the page exceeds allowed resources (Page.crash())
page can spinlock/oom due to infinite loops or crypto mining in its JS main thread
scrolling/input/screenshot before activateTarget focus can crash targets (5sec delayed!)
handling a JS popup before activateTarget or attempting to handle it after already closing
parent frame navigation during child onbeforeunload "are you sure you want to leave?"
any of the above crashes in a nested OOPIF leading to subtle issues in the parent target

Playwright handled about half of these well, and presented impassible barrier to solving the other half, so we made the call to switch. But now we're faced with the difficult challenge of solving 100% of these cases ourself.

We take on this challenge with glee, we'd rather lose sleep thinking about these things so you can build reliable apps on top of us 💪.

Case Studies: Key Changes in the Migration

New CDP-USE Library Providing Python Type Bindings

A type-safe Python client generator for the Chrome DevTools Protocol (CDP). This library automatically generates Python bindings with full TypeScript-like type safety from the official CDP protocol specifications. It's only shallow type bindings, no complex logic for session management, pages, elements, etc. just 100% direct access.

Check out the library here: github.com/browser-use/cdp-use ➡️

New Event-Driven Architecturre

We used to only update our view of the world between actions, right before sending the next state summary to the LLM. This makes sense when your assumption is that the page contents will only change as a result of actions, but this is not always true!

Take for example a slowly loading list of results that stream in, an animated carousel, or a bit of JS that runs every 3s. All of these are examples of things that can happen at any point in the agent action/runloop cycle.

We've introduced a new event-driven architecture to better fit the underlying event-driven architecture of CDP. Now we can subscribe to and respond to CDP events, which we set up in "watchdog" services that monitor for various things.

For example, our downloads_watchdog watches for any file downloads that start spontaneously, whether triggered by a click, js executing, or any other method. crash_watchdog.py can now watch for page crashes in a single place by just subscribing to a crash event, and we no longer have to scatter crash detection and retry logic all over the rest of the codebase.

Check out the library powering this: github.com/browser-use/bubus ➡️

New Extracted Element Handle that works across OOPIFs

A tab is not a page; it’s a constellation of targets (root + cross-origin iframes + workers), each hosting frames, each containing nodes. Abstract that away and you lose the ability to route input, correlate events, and re-find elements after DOM churn.

We now represent nodes with "super-selectors" that include targetId, frameId, backendNodeId, x/y position, and fallback selectors:

@dataclass(frozen=True)
class EnhancedDOMTreeNode:
    target_id: str                 # which DevTools target owns the renderer
    frame_id: str                  # which frame inside that target
    backend_node_id: int           # renderer-local node handle
    frame_path: Tuple[str, ...]    # root → ... → leaf, useful for sanity checks
    element_index: int             # LLM-friendly stable ordinal for UX
    ...

Minimal routing helpers

class BrowserSession:
    # caches are kept warm by watchers listening to Target.* and Page.*
    def cdp_client_for_frame(self, frame_id: str):
        target_id = self.target_id_by_frame_id(frame_id)
        return self.cdp_clients_for_target(target_id)[0]  # long-lived session
 
    def route_to_node(self, ref: EnhancedDOMRef):
        client = self.cdp_client_for_frame(ref.frame_id)
        return client, {"session_id": self.session_id_by_frame_id(ref.frame_id)}

Outcome: zero guessing about who owns the node or where input should land, even with nested cross-origin iframes and DOM element shifts after actions.

Time is a Flat Circle

Back in my first startup job in 2014 we were using PhantomJS and some RPC between python and JS, and in a way it's surprising how little has changed since then. Now it's 2025 I'm still dealing with all the same issues: tab crash handling, page load retrying, JS+Python RPC translation issues, python asyncio headaches, mouse movement fuzzing, etc.

Luckily a lot has improved since 2014, and we finally have a big light at end of the tunnel leading out of the manual QA automation mines: AI.

We aim to continue solving all the complexities of browser automation and CDP for our users. Our agents shouldn't have to know the nuances of CDP Targets in order to Get Stuff Done™️, and neither should you.

Try out our new libraries and beta releases with cdp-use and let us know your feedback!

Closer to the Metal: Leaving Playwright for CDP

Goodbye Playwright, Hello CDP

The Curse of Abstraction

📜 A Quick History of Browser Automation

The Millenial Era

Selenium Dominance Era

The Pre-Headless Era (the Dark Ages)

Headless Chrome & CDP Era

Multi-Browser Standardization Era

Modern Times: A Multitude of Choice

How do Browser Drivers Work?

🔌 What are the automation APIs that Chromium actually exposes?

Ignoring These for Now

🎭 How does Playwright work?

🩸 Playwright's Sharp Edges

🍳 Starting From Scratch: Out of the frying pan and into the fire

Case Studies: Key Changes in the Migration

New CDP-USE Library Providing Python Type Bindings

New Event-Driven Architecturre

New Extracted Element Handle that works across OOPIFs

Time is a Flat Circle

Read More

Celebrating One Year of Progress in Browser Agents

Speed Matters: How Browser Use Achieves the Fastest Agent Execution

The Fastest Web Agent in the World

Goodbye Playwright, Hello CDP

The Curse of Abstraction

📜 A Quick History of Browser Automation

The Millenial Era

Selenium Dominance Era

The Pre-Headless Era (~~the Dark Ages~~)

Headless Chrome & CDP Era

Multi-Browser Standardization Era

Modern Times: A Multitude of Choice

How do Browser Drivers Work?

🔌 What are the automation APIs that Chromium actually exposes?

Ignoring These for Now

🎭 How does Playwright work?

🩸 Playwright's Sharp Edges

🍳 Starting From Scratch: Out of the frying pan and into the fire

Case Studies: Key Changes in the Migration

New CDP-USE Library Providing Python Type Bindings

New Event-Driven Architecturre

New Extracted Element Handle that works across OOPIFs

Time is a Flat Circle

Read More

Celebrating One Year of Progress in Browser Agents

Speed Matters: How Browser Use Achieves the Fastest Agent Execution

The Fastest Web Agent in the World

The Pre-Headless Era (the Dark Ages)