diff --git a/docs/architecture.md b/docs/architecture.md index d9cd547..bb0b0c7 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,45 +1,46 @@ +--- +title: Architecture +description: Internals of go-webview -- CDP connection, message protocol, DOM queries, console capture, action system, and Angular helpers. +--- + # Architecture -Module: `forge.lthn.ai/core/go-webview` +This document describes how `go-webview` works internally. It covers the CDP connection lifecycle, message protocol, DOM query mechanics, input simulation, console capture, the action system, Angular helpers, and thread safety. -## Overview +## High-Level Data Flow -go-webview is a Chrome DevTools Protocol (CDP) client for browser automation, testing, and scraping. It provides a high-level Go API over the low-level CDP WebSocket protocol, connecting to an externally managed Chrome or Chromium instance running with the remote debugging port enabled. +``` +Application Code + | + v + Webview (high-level API: Navigate, Click, Type, Screenshot, ...) + | + v + CDPClient (WebSocket transport, message framing, event dispatch) + | + v + Chrome / Chromium (running with --remote-debugging-port=9222) +``` -The package does not launch Chrome itself. The caller is responsible for starting a Chrome process with `--remote-debugging-port=9222` before constructing a `Webview`. - ---- - -## Package Structure - -| File | Responsibility | -|------|---------------| -| `webview.go` | `Webview` struct, public API, navigation, DOM, screenshot, JS evaluation | -| `cdp.go` | `CDPClient` — WebSocket transport, message framing, event dispatch | -| `actions.go` | `Action` interface, concrete action types, `ActionSequence` builder | -| `console.go` | `ConsoleWatcher`, `ExceptionWatcher`, log formatting | -| `angular.go` | `AngularHelper` — SPA-specific helpers for Angular 2+ and AngularJS 1.x | - ---- +The application interacts with `Webview` methods. Each method constructs a CDP command, passes it to `CDPClient.Call()`, which serialises it as JSON over a WebSocket connection to Chrome. Chrome processes the command and returns a JSON response. Events (console messages, exceptions, navigation state changes) flow in the opposite direction: Chrome pushes them over the WebSocket, the `CDPClient` read loop dispatches them to registered handlers. ## CDP Connection ### Initialisation -`NewCDPClient(debugURL string)` connects to Chrome's HTTP endpoint: +`NewCDPClient(debugURL string)` connects to Chrome's HTTP endpoint in four steps: 1. Issues `GET {debugURL}/json` to retrieve the list of available targets (tabs/pages). 2. Selects the first target with `type == "page"` that has a `webSocketDebuggerUrl`. 3. If no page target exists, calls `GET {debugURL}/json/new` to create one. -4. Upgrades the connection to WebSocket using `github.com/gorilla/websocket`. -5. Starts a background `readLoop` goroutine on the connection. +4. Upgrades the connection to WebSocket using `github.com/gorilla/websocket` and starts a background `readLoop` goroutine. ### Message Protocol CDP uses JSON-framed messages over WebSocket. The client distinguishes two message kinds: -- **Commands** — sent by the client with an integer `id`. Chrome responds with a matching `id` and a `result` or `error` field. -- **Events** — sent by Chrome without an `id`. They carry a `method` name and a `params` map. +- **Commands** -- sent by the client with an integer `id`. Chrome responds with a matching `id` and a `result` or `error` field. +- **Events** -- sent by Chrome without an `id`. They carry a `method` name and a `params` map. The `CDPClient` maintains a `pending` map of `id -> chan *cdpResponse`. When `Call()` sends a command it registers a channel, then blocks on that channel until the matching response arrives or the context expires. @@ -49,25 +50,42 @@ Events are dispatched to zero or more registered handlers via `OnEvent(method, h ``` New(WithDebugURL(...)) - └── NewCDPClient(url) - ├── HTTP GET /json (target discovery) - ├── websocket.Dial(wsURL) (WebSocket upgrade) - └── go readLoop() (background goroutine) + +-- NewCDPClient(url) + |-- HTTP GET /json (target discovery) + |-- websocket.Dial(wsURL) (WebSocket upgrade) + +-- go readLoop() (background goroutine) wv.Close() - └── cancel() (signals readLoop to stop) - └── CDPClient.Close() - ├── <-done (waits for readLoop to finish) - └── conn.Close() (closes WebSocket) + +-- cancel() (signals readLoop to stop) + +-- CDPClient.Close() + |-- <-done (waits for readLoop to finish) + +-- conn.Close() (closes WebSocket) ``` ---- +## Key Types -## Webview Struct +### CDPClient + +```go +type CDPClient struct { + conn *websocket.Conn + debugURL string + wsURL string + msgID atomic.Int64 // monotonic command ID + pending map[int64]chan *cdpResponse // awaiting responses + handlers map[string][]func(map[string]any) // event subscribers + ctx context.Context + cancel context.CancelFunc + done chan struct{} +} +``` + +The core transport layer. All WebSocket reads happen in the `readLoop` goroutine. All writes are serialised through a `sync.RWMutex`. The `pending` map and `handlers` map each have their own dedicated mutexes. + +### Webview ```go type Webview struct { - mu sync.RWMutex client *CDPClient ctx context.Context cancel context.CancelFunc @@ -77,40 +95,22 @@ type Webview struct { } ``` -`New()` accepts functional options: +The high-level API surface. Constructed via `New()` with functional options. On construction, it enables three CDP domains -- `Runtime`, `Page`, and `DOM` -- and registers a handler for `Runtime.consoleAPICalled` events so console capture begins immediately. -| Option | Effect | -|--------|--------| -| `WithDebugURL(url)` | Required. Connects to Chrome at the given HTTP debug endpoint. | -| `WithTimeout(d)` | Overrides the default 30-second operation timeout. | -| `WithConsoleLimit(n)` | Maximum console messages to retain in memory (default 1000). | +### ConsoleMessage -On construction, `New()` enables three CDP domains — `Runtime`, `Page`, and `DOM` — and registers a handler for `Runtime.consoleAPICalled` events to begin console capture immediately. +```go +type ConsoleMessage struct { + Type string // log, warn, error, info, debug + Text string // message text + Timestamp time.Time + URL string // source URL + Line int // source line number + Column int // source column number +} +``` ---- - -## Navigation - -`Navigate(url string) error` calls `Page.navigate` then polls `document.readyState` via `Runtime.evaluate` at 100 ms intervals until the value is `"complete"` or the context deadline is exceeded. - -`Reload()`, `GoBack()`, and `GoForward()` follow the same pattern: issue a CDP command then call `waitForLoad`. - -`waitForSelector(ctx, selector)` polls `document.querySelector(selector)` at 100 ms intervals. - ---- - -## DOM Queries - -DOM queries follow a two-step pattern: - -1. Call `DOM.getDocument` to obtain the root node ID. -2. Call `DOM.querySelector` or `DOM.querySelectorAll` with that node ID and the CSS selector string. - -For each matching node, `getElementInfo` calls: -- `DOM.describeNode` — tag name and attribute list (flat alternating key/value array) -- `DOM.getBoxModel` — bounding rectangle from the `content` quad - -The returned `ElementInfo` carries: +### ElementInfo ```go type ElementInfo struct { @@ -123,7 +123,37 @@ type ElementInfo struct { } ``` ---- +### BoundingBox + +```go +type BoundingBox struct { + X float64 + Y float64 + Width float64 + Height float64 +} +``` + +## Navigation + +`Navigate(url string) error` calls `Page.navigate` then polls `document.readyState` via `Runtime.evaluate` at 100 ms intervals until the value is `"complete"` or the context deadline is exceeded. + +`Reload()`, `GoBack()`, and `GoForward()` follow the same pattern: issue a CDP command then call `waitForLoad`. + +`waitForSelector(ctx, selector)` polls `document.querySelector(selector)` at 100 ms intervals until the element exists or the context expires. + +## DOM Queries + +DOM queries follow a two-step pattern: + +1. Call `DOM.getDocument` to obtain the root node ID. +2. Call `DOM.querySelector` or `DOM.querySelectorAll` with that node ID and the CSS selector string. + +For each matching node, `getElementInfo` calls: +- `DOM.describeNode` -- tag name and attribute list (flat alternating key/value array) +- `DOM.getBoxModel` -- bounding rectangle from the `content` quad + +`QuerySelectorAllAll(selector)` returns an `iter.Seq[*ElementInfo]` iterator for lazy consumption of results. ## Click and Type @@ -137,8 +167,6 @@ type ElementInfo struct { `PressKeyAction` handles named keys (Enter, Tab, Escape, Backspace, Delete, arrow keys, Home, End, Page Up, Page Down) by mapping them to their CDP virtual key codes and code strings. ---- - ## Console Capture Console capture is enabled in `New()` by subscribing to `Runtime.consoleAPICalled` events. @@ -148,43 +176,64 @@ Console capture is enabled in `New()` by subscribing to `Runtime.consoleAPICalle The `Webview` itself accumulates messages in a slice guarded by `sync.RWMutex`. When the buffer reaches `consoleLimit`, the oldest 100 messages are dropped. ```go -msgs := wv.GetConsole() // returns a copy +msgs := wv.GetConsole() // returns a collected slice wv.ClearConsole() + +// Or iterate lazily +for msg := range wv.GetConsoleAll() { + fmt.Println(msg.Text) +} ``` ### ConsoleWatcher `ConsoleWatcher` (constructed via `NewConsoleWatcher(wv)`) registers its own handler on the same `Runtime.consoleAPICalled` event. It adds filtering and reactive capabilities: -- `AddFilter(ConsoleFilter)` — filter by message type and/or text pattern -- `AddHandler(ConsoleHandler)` — callback invoked for each incoming message (outside the write lock) -- `WaitForMessage(ctx, filter)` — blocks until a matching message arrives -- `WaitForError(ctx)` — convenience wrapper for `type == "error"` +- `AddFilter(ConsoleFilter)` -- filter by message type and/or text pattern (substring match) +- `AddHandler(ConsoleHandler)` -- callback invoked for each incoming message (outside the write lock) +- `WaitForMessage(ctx, filter)` -- blocks until a matching message arrives +- `WaitForError(ctx)` -- convenience wrapper for `type == "error"` - `Errors()`, `Warnings()`, `HasErrors()`, `ErrorCount()` +- `FilteredMessages()` / `FilteredMessagesAll()` -- returns messages matching all active filters ### ExceptionWatcher -`ExceptionWatcher` subscribes to `Runtime.exceptionThrown` events and captures unhandled JavaScript exceptions with full stack traces. It exposes the same reactive pattern as `ConsoleWatcher`: `AddHandler`, `WaitForException`, `HasExceptions`. +`ExceptionWatcher` subscribes to `Runtime.exceptionThrown` events and captures unhandled JavaScript exceptions with full stack traces: ---- +```go +type ExceptionInfo struct { + Text string + LineNumber int + ColumnNumber int + URL string + StackTrace string + Timestamp time.Time +} +``` + +It exposes the same reactive pattern as `ConsoleWatcher`: `AddHandler`, `WaitForException`, `HasExceptions`, `Count`. + +### FormatConsoleOutput + +The package-level `FormatConsoleOutput(messages)` function formats a slice of `ConsoleMessage` into human-readable lines with timestamp, level prefix (`[ERROR]`, `[WARN]`, `[INFO]`, `[DEBUG]`, `[LOG]`), and message text. ## Screenshots `Screenshot()` calls `Page.captureScreenshot` with `format: "png"`. Chrome returns the image as a base64-encoded string in the `data` field of the response. The method decodes this and returns raw PNG bytes. ---- - ## JavaScript Evaluation `evaluate(ctx, script)` calls `Runtime.evaluate` with `returnByValue: true`. The result is extracted from `result.result.value`. If `result.exceptionDetails` is present, the error description is returned as a Go error. `Evaluate(script string) (any, error)` is the public wrapper that applies the default timeout. -`GetURL()` and `GetTitle()` are thin wrappers that evaluate `window.location.href` and `document.title` respectively. +Convenience wrappers: -`GetHTML(selector string)` evaluates `outerHTML` on the matched element, or `document.documentElement.outerHTML` when the selector is empty. - ---- +| Method | JavaScript evaluated | +|--------|---------------------| +| `GetURL()` | `window.location.href` | +| `GetTitle()` | `document.title` | +| `GetHTML(selector)` | `document.querySelector(selector)?.outerHTML` (or `document.documentElement.outerHTML` when selector is empty) | ## Action System @@ -196,12 +245,36 @@ type Action interface { } ``` -Concrete action types cover: `Click`, `Type`, `Navigate`, `Wait`, `WaitForSelector`, `Scroll`, `ScrollIntoView`, `Focus`, `Blur`, `Clear`, `Select`, `Check`, `Hover`, `DoubleClick`, `RightClick`, `PressKey`, `SetAttribute`, `RemoveAttribute`, `SetValue`. +### Concrete Action Types -`ActionSequence` provides a fluent builder: +| Type | Description | +|------|-------------| +| `ClickAction` | Click an element by CSS selector | +| `TypeAction` | Type text into a focused element | +| `NavigateAction` | Navigate to a URL and wait for load | +| `WaitAction` | Wait for a fixed duration | +| `WaitForSelectorAction` | Wait for an element to appear | +| `ScrollAction` | Scroll to absolute coordinates | +| `ScrollIntoViewAction` | Scroll an element into view smoothly | +| `FocusAction` | Focus an element | +| `BlurAction` | Remove focus from an element | +| `ClearAction` | Clear an input's value, firing `input` and `change` events | +| `SelectAction` | Select a value in a `