Closer to the Metal: Leaving Playwright for CDP
by Nick Sweeting
Goodbye Playwright, Hello CDP
Playwright and Puppeteer are great for making QA tests and automation scripts short and readable, but as AI browser companies have been learning the hard way over the last year, sometimes these adapters obscure important details about the underlying browsers.
We decided to peek behind the curtain and figure out what the browser was really doing, and it made us decide to drop playwright entirely and just speak the browser's native tongue: CDP.
By switcing to raw CDP we've massively increased the speed of element extraction, screenshots, and all our default actions. We've also managed to add new async reaction capabilities to the agent, and proper cross-origin iframe support.
Obviously we ignored the (wise) advice in the header of the getting-started-with-cdp docs 👌
The Curse of Abstraction

Building AI browser automation is like building on top of a jenga tower of complexity. Every layer presents its own leaky abstractions, its own subtle crashes, and its own resource constraints.
If you've ever heavily depended on an adapter library and build up a large codebase around it, you know the feeling that eventually comes when you realize the adapter library is no longer saving you any time by "hiding the true complexity". In our case that time has finally come for Browser-Use and playwright-python, the library that we've historically used to drive our browsers with LLM-powered tool calls like click
, input_text
, go_to_url
.
At first glance it may seem foolish to throw out such a mature adapter library and reinvent the wheel, but luckily the needs of AI browser agents are much narrower than the entire surface area that playwright provides, and we believe we can implement the calls we need with more specialized logic to better suit AI drivers.
Playwright also introduces a 2nd network hop going through a node.js playwright server websocket, which incurs a meaningful amount of latency when we do thousands of CDP calls to check for element position, opacity, paint order, JS event listeners, aria properties, etc.
📜 A Quick History of Browser Automation
To really understand why the browser automation is in the state it's in today, we have to look back at some history.
The Dark Ages
- 2011–2017 — Before headless Chrome, PhantomJS (a headless WebKit-based browser) filled the gap for scripting “like a browser,” with mixed reliability.
- 2011 — Chrome ships remote debugging; work happens “upstream” in WebKit so other ports can adopt it (post by Pavel Feldman).
- 2012 — WebKit Remote Debugging Protocol v1.0 announced; early docs/talks outline the domains/events model that CDP still uses.
- 2013–2014 — Blink forks from WebKit; the protocol solidifies on the Chromium side and becomes known as the Chrome DevTools Protocol (CDP). Extensions can tunnel it via
chrome.debugger
and--remote-debugging
flag. - 2014 — Chrome’s
chrome.automation
(accessibility/automation) extension API appears (exposes the accessibility tree; separate from CDP).
Headless Chrome & CDP Era
- Apr 2017 — Headless Chrome announced; Puppeteer introduced as a Chrome team Node library to drive Chrome (headless/full) via CDP.
- Jan 2018 — Puppeteer 1.0 ships.
- Jun 2018 — WebDriver becomes a W3C Recommendation (cross-browser standard). ChromeDriver implements W3C WebDriver (and later BiDi) and is tightly coupled to Chrome releases.
- 2019 — Google I/O talk by Andrey Lushnikov & Joel Einbinder (DevTools/Puppeteer team) popularizes modern testing with Puppeteer.
Multi-Browser Standardization Era
- 2019–early 2020 — Several core Puppeteer engineers leave Google for Microsoft and start Playwright (cross-browser automation/test framework) 🎭 (oooo drama)
- Jan 31, 2020 — Playwright 0.x public release.
- May 6, 2020 — Playwright 1.0 ships.
- Sep–Oct 2020 → — Multi-language support begins (e.g., Playwright for Python announced Sep 30, 2020).
- 2023 — ChromeDriver adds WebDriver BiDi support (alongside classic WebDriver).
- 2024 — Puppeteer adds WebDriver BiDi support; Selenium “welcomes Puppeteer to the WebDriver world.”
Modern Times: A Multitude of Choice
Now in 2025 we are lucky to have many high quality driver libraries to choose between, our favorites include:
- ⭐️
pydoll
(best python-first playwright replacement) - ⭐️
go-rod
(best CDP reference implementation),chromedp
(great CDP debug tooling) puppeteer
(best native chrome behavior),playwright
(best cross-browser support)selenium
,cypress
(automate with old-school WebDrivers)appium
automate via system-level accessibility APIs on Android, iOS, macOS, Windows
So why did we feel the need to write our own with cdp-use
? Well for all the same reasons as everyone else: everlasting desire to be closer to the metal and have more detailed control over every step.
How do Browser Drivers Work?
So what APIs does the browser actually expose anyway? What sits underneath all these "drivers"?
🔌 What are the automation APIs that Chromium actually exposes?
All these adapter libraries, drivers, and AI helper extensions really just exist to pass messages and make RPC calls to these underlying browser APIs:
-
Chrome Extension APIs
chrome.tabs.captureVisibleTab()
chrome.automation.getTree()
chrome.scripting.executeScript()
chrome.debugger.sendCommand({tabId: 123}, "Page.navigate", {url})
Chrome Extension APIs appear to be the most powerful at first glance because they encompass CDP with
chrome.debugger
, but raw CDP lets you access some calls that are not available throughchrome.debugger
, and allows parallel connections to multiple targets.
-
CDP APIs (via pure CDP Websocket or WebDriver BIDI socket)
Page.navigate({url})
Target.createTarget()
Page.handleJavaScriptDialog({accept: true})
Browser.setDownloadBehavior()
-
OS-Level Accessibility & screenreader APIs (NVDA/Voice Over/AppleScript/Appium/etc.)
- get a tree/rotor view of all elements shown to screenreaders (links, buttons, inputs, ...)
- script copy/paste, mouse, keypress, and element rotor/tab-based navigation
-
Internal Chromium C++ APIs (within Chromium source code)
- you can call arbitrary helpers
content/browser/devtools/protocol/*_handler.cc
- you can edit the CDP spec and add commands to call custom C++ APIs
third_party/blink/public/devtools_protocol/browser_protocol.pdl
- ... anything is possible when the call is coming from inside the house ...
- you can call arbitrary helpers
Ignoring These for Now
-
Classic WebDriver W3C / ChromeDriver REST APIs
/session/{id}/url
/session/{id}/element/{eid}/click
/session/{id}/actions
- ... etc. These are not actually exposed by any browser directly, rather they are the W3C standardized REST API shape recommended for drivers (ChromeDriver, GeckoDriver, WebKitDriver, selenium) to provide to clients above raw browser calls via CDP/BIDI.
-
Webdriver BIDI (websocket)
- merger of the old REST-API based WebDriver system + CDP in a single websocket, official release delayed for years now, not feature complete yet. check back in 2027
🎭 How does Playwright work?
Playwright achieves multi-languge support by using a client-server model between clients in various languages and a single core implementation that runs as a node.js websocket server.
The playwright node.js relay server accepts standardized "playwright protocol" RPC calls from playwright clients, and then sends out CDP or BIDI calls to the browser to execute them.
This API is elegant in some ways, the "playwright protocol" of commands provides a nicely typed RPC interface and standardizes behavior across languages. Playwright also nicely abstracts lower-level browser ideas like targets, frames, and sessions into simple Page
and BrowserContext
handles and (usually) manages to keep those handles in sync and not deadlocked across node.js, the browser, and python.
Unfortunately the double RPC through the node.js relay means some state inevitably drifts across the 3 places (and across three different languages and runtimes):
- live browser
- playwright node.js relay process
- python client process
When a tab crashes in the browser or some operation is performed without focusing a page correctly, there are edge cases where the node.js process can hang indefinitely waiting for a browser reply, meanwhile the python client needs to send the CDP call the browser is expecting in order to proceed. Currently we have no recourse but to kill -9 and attempt to reconnect to the browser from scratch with a new playwright instance.
There are numerous cases like that only crop up in 1% of cases with specific slow network conditions, but edge cases can quickly drag down overall success scores when we run thousands of steps per eval.
🩸 Playwright's Sharp Edges
The playwright happy paths usually work fine, but the devil is in the details:
fullPage=True
screenshot on pages longer than>16,000px
high (reliably crashes playwright)alert()
/confirm()
/onbeforeunload
handling- attempting to keyboard/mouse/dialog input without focusing a page
- file upload & download handling on remote browsers
about:*
,chrome://*
,chrome-error://
,chrome-extension://
, PDF tab handling- chrome preferences and enterprise/registry configuration management
- crashed tab handling for
DEBUG=pw:api
helps but it only goes so far, at a certain point it doens't make sense to build workarounds around a relay layer that we're fighting to customize and control anyway.
Sometimes when you are forced to thoroughly stretch every nook and cranny of an adapter layer, you start to see the ugly truths of the underlying resource, and you no longer want the "pretty version" as a veil pulled over your eyes, you'd rather see the ugly truth.
🍳 Starting From Scratch: Out of the frying pan and into the fire
Delivering a reliable experience when so many of the underlying components are inherently unreliable (or actively adversarial) is a monumental engineering challenge.
Did you know there are at least 10 different ways a tab can crash in Chrome?
- all targets start in a briefly semi-"crashed"/unresponsive state while initial requests are inflight, before the main page JS thread starts
- chrome zygote/root process can crash (slow user_data_dir/filesystem io, oom, cpu lag, etc.)
- GPU process can crash, there's even a helpful CDP call
Browser.crashGpuProcess
- page renderers can crash due to exceptions raised within chrome source (sigsev, oom, etc.)
- page renderers can crash because the page exceeds allowed resources (
Page.crash()
) - page can spinlock/oom due to infinite loops or crypto mining in its JS main thread
- scrolling/input/screenshot before
activateTarget
focus can crash targets (5sec delayed!) - handling a JS popup before activateTarget or attempting to handle it after already closing
- parent frame navigation during child
onbeforeunload
"are you sure you want to leave?" - any of the above crashes in a nested OOPIF leading to subtle issues in the parent target
Playwright handled about half of these well, and presented impassible barrier to solving the other half, so we made the call to switch. But now we're faced with the difficult challenge of solving 100% of these cases ourself.
We take on this challenge with glee, we'd rather lose sleep thinking about these things so you can build reliable apps on top of us 💪.
Case Studies: Key Changes in the Migration
New CDP-USE Library Providing Python Type Bindings
A type-safe Python client generator for the Chrome DevTools Protocol (CDP). This library automatically generates Python bindings with full TypeScript-like type safety from the official CDP protocol specifications. It's only shallow type bindings, no complex logic for session management, pages, elements, etc. just 100% direct access.
Check out the library here: github.com/browser-use/cdp-use ➡️
New Event-Driven Architecturre
We used to only update our view of the world between actions, right before sending the next state summary to the LLM. This makes sense when your assumption is that the page contents will only change as a result of actions, but this is not always true!
Take for example a slowly loading list of results that stream in, an animated carousel, or a bit of JS that runs every 3s. All of these are examples of things that can happen at any point in the agent action/runloop cycle.
We've introduced a new event-driven architecture to better fit the underlying event-driven architecture of CDP. Now we can subscribe to and respond to CDP events, which we set up in "watchdog" services that monitor for various things.
For example, our downloads_watchdog
watches for any file downloads that start spontaneously, whether triggered by a click, js executing, or any other method. crash_watchdog.py
can now watch for page crashes in a single place by just subscribing to a crash event, and we no longer have to scatter crash detection and retry logic all over the rest of the codebase.
Check out the library powering this: github.com/browser-use/bubus ➡️
New Extracted Element Handle that works across OOPIFs
A tab is not a page; it’s a constellation of targets (root + cross-origin iframes + workers), each hosting frames, each containing nodes. Abstract that away and you lose the ability to route input, correlate events, and re-find elements after DOM churn.
We now represent nodes with "super-selectors" that include targetId
, frameId
, backendNodeId
, x/y position, and fallback selectors:
@dataclass(frozen=True)
class EnhancedDOMTreeNode:
target_id: str # which DevTools target owns the renderer
frame_id: str # which frame inside that target
backend_node_id: int # renderer-local node handle
frame_path: Tuple[str, ...] # root → ... → leaf, useful for sanity checks
element_index: int # LLM-friendly stable ordinal for UX
...
Minimal routing helpers
class BrowserSession:
# caches are kept warm by watchers listening to Target.* and Page.*
def cdp_client_for_frame(self, frame_id: str):
target_id = self.target_id_by_frame_id(frame_id)
return self.cdp_clients_for_target(target_id)[0] # long-lived session
def route_to_node(self, ref: EnhancedDOMRef):
client = self.cdp_client_for_frame(ref.frame_id)
return client, {"session_id": self.session_id_by_frame_id(ref.frame_id)}
Outcome: zero guessing about who owns the node or where input should land, even with nested cross-origin iframes and DOM element shifts after actions.
Time is a Flat Circle

Back in my first startup job in 2014 we were using PhantomJS and some RPC between python and JS, and in a way it's surprising how little has changed since then. Now it's 2025 I'm still dealing with all the same issues: tab crash handling, page load retrying, JS+Python RPC translation issues, python asyncio headaches, mouse movement fuzzing, etc.
Luckily a lot has improved since 2014, and we finally have a big light at end of the tunnel leading out of the manual QA automation mines: AI.
We aim to continue solving all the complexities of browser automation and CDP for our users. Our agents shouldn't have to know the nuances of CDP Targets in order to Get Stuff Done™️, and neither should you.
Try out our new libraries and beta releases with cdp-use and let us know your feedback!