go-webview/docs/architecture.md
Snider 7c46558e5b docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:01:55 +00:00

277 lines
11 KiB
Markdown

# Architecture
Module: `forge.lthn.ai/core/go-webview`
## Overview
go-webview is a Chrome DevTools Protocol (CDP) client for browser automation, testing, and scraping. It provides a high-level Go API over the low-level CDP WebSocket protocol, connecting to an externally managed Chrome or Chromium instance running with the remote debugging port enabled.
The package does not launch Chrome itself. The caller is responsible for starting a Chrome process with `--remote-debugging-port=9222` before constructing a `Webview`.
---
## Package Structure
| File | Responsibility |
|------|---------------|
| `webview.go` | `Webview` struct, public API, navigation, DOM, screenshot, JS evaluation |
| `cdp.go` | `CDPClient` — WebSocket transport, message framing, event dispatch |
| `actions.go` | `Action` interface, concrete action types, `ActionSequence` builder |
| `console.go` | `ConsoleWatcher`, `ExceptionWatcher`, log formatting |
| `angular.go` | `AngularHelper` — SPA-specific helpers for Angular 2+ and AngularJS 1.x |
---
## CDP Connection
### Initialisation
`NewCDPClient(debugURL string)` connects to Chrome's HTTP endpoint:
1. Issues `GET {debugURL}/json` to retrieve the list of available targets (tabs/pages).
2. Selects the first target with `type == "page"` that has a `webSocketDebuggerUrl`.
3. If no page target exists, calls `GET {debugURL}/json/new` to create one.
4. Upgrades the connection to WebSocket using `github.com/gorilla/websocket`.
5. Starts a background `readLoop` goroutine on the connection.
### Message Protocol
CDP uses JSON-framed messages over WebSocket. The client distinguishes two message kinds:
- **Commands** — sent by the client with an integer `id`. Chrome responds with a matching `id` and a `result` or `error` field.
- **Events** — sent by Chrome without an `id`. They carry a `method` name and a `params` map.
The `CDPClient` maintains a `pending` map of `id -> chan *cdpResponse`. When `Call()` sends a command it registers a channel, then blocks on that channel until the matching response arrives or the context expires.
Events are dispatched to zero or more registered handlers via `OnEvent(method, handler)`. Each handler is called in its own goroutine so it cannot block the read loop.
### Connection Lifecycle
```
New(WithDebugURL(...))
└── NewCDPClient(url)
├── HTTP GET /json (target discovery)
├── websocket.Dial(wsURL) (WebSocket upgrade)
└── go readLoop() (background goroutine)
wv.Close()
└── cancel() (signals readLoop to stop)
└── CDPClient.Close()
├── <-done (waits for readLoop to finish)
└── conn.Close() (closes WebSocket)
```
---
## Webview Struct
```go
type Webview struct {
mu sync.RWMutex
client *CDPClient
ctx context.Context
cancel context.CancelFunc
timeout time.Duration // default 30s
consoleLogs []ConsoleMessage
consoleLimit int // default 1000
}
```
`New()` accepts functional options:
| Option | Effect |
|--------|--------|
| `WithDebugURL(url)` | Required. Connects to Chrome at the given HTTP debug endpoint. |
| `WithTimeout(d)` | Overrides the default 30-second operation timeout. |
| `WithConsoleLimit(n)` | Maximum console messages to retain in memory (default 1000). |
On construction, `New()` enables three CDP domains — `Runtime`, `Page`, and `DOM` — and registers a handler for `Runtime.consoleAPICalled` events to begin console capture immediately.
---
## Navigation
`Navigate(url string) error` calls `Page.navigate` then polls `document.readyState` via `Runtime.evaluate` at 100 ms intervals until the value is `"complete"` or the context deadline is exceeded.
`Reload()`, `GoBack()`, and `GoForward()` follow the same pattern: issue a CDP command then call `waitForLoad`.
`waitForSelector(ctx, selector)` polls `document.querySelector(selector)` at 100 ms intervals.
---
## DOM Queries
DOM queries follow a two-step pattern:
1. Call `DOM.getDocument` to obtain the root node ID.
2. Call `DOM.querySelector` or `DOM.querySelectorAll` with that node ID and the CSS selector string.
For each matching node, `getElementInfo` calls:
- `DOM.describeNode` — tag name and attribute list (flat alternating key/value array)
- `DOM.getBoxModel` — bounding rectangle from the `content` quad
The returned `ElementInfo` carries:
```go
type ElementInfo struct {
NodeID int
TagName string
Attributes map[string]string
InnerHTML string
InnerText string
BoundingBox *BoundingBox // nil if element has no layout box
}
```
---
## Click and Type
### Click
`click(ctx, selector)` resolves the element's bounding box, computes the centre point, then dispatches `Input.dispatchMouseEvent` for `mousePressed` then `mouseReleased`. If the element has no bounding box (e.g. a hidden element), it falls back to evaluating `document.querySelector(selector)?.click()`.
### Type
`typeText(ctx, selector, text)` first focuses the element via JavaScript, then dispatches `Input.dispatchKeyEvent` with `type: "keyDown"` and `type: "keyUp"` for each character in the string individually.
`PressKeyAction` handles named keys (Enter, Tab, Escape, Backspace, Delete, arrow keys, Home, End, Page Up, Page Down) by mapping them to their CDP virtual key codes and code strings.
---
## Console Capture
Console capture is enabled in `New()` by subscribing to `Runtime.consoleAPICalled` events.
### Basic Capture (Webview)
The `Webview` itself accumulates messages in a slice guarded by `sync.RWMutex`. When the buffer reaches `consoleLimit`, the oldest 100 messages are dropped.
```go
msgs := wv.GetConsole() // returns a copy
wv.ClearConsole()
```
### ConsoleWatcher
`ConsoleWatcher` (constructed via `NewConsoleWatcher(wv)`) registers its own handler on the same `Runtime.consoleAPICalled` event. It adds filtering and reactive capabilities:
- `AddFilter(ConsoleFilter)` — filter by message type and/or text pattern
- `AddHandler(ConsoleHandler)` — callback invoked for each incoming message (outside the write lock)
- `WaitForMessage(ctx, filter)` — blocks until a matching message arrives
- `WaitForError(ctx)` — convenience wrapper for `type == "error"`
- `Errors()`, `Warnings()`, `HasErrors()`, `ErrorCount()`
### ExceptionWatcher
`ExceptionWatcher` subscribes to `Runtime.exceptionThrown` events and captures unhandled JavaScript exceptions with full stack traces. It exposes the same reactive pattern as `ConsoleWatcher`: `AddHandler`, `WaitForException`, `HasExceptions`.
---
## Screenshots
`Screenshot()` calls `Page.captureScreenshot` with `format: "png"`. Chrome returns the image as a base64-encoded string in the `data` field of the response. The method decodes this and returns raw PNG bytes.
---
## JavaScript Evaluation
`evaluate(ctx, script)` calls `Runtime.evaluate` with `returnByValue: true`. The result is extracted from `result.result.value`. If `result.exceptionDetails` is present, the error description is returned as a Go error.
`Evaluate(script string) (any, error)` is the public wrapper that applies the default timeout.
`GetURL()` and `GetTitle()` are thin wrappers that evaluate `window.location.href` and `document.title` respectively.
`GetHTML(selector string)` evaluates `outerHTML` on the matched element, or `document.documentElement.outerHTML` when the selector is empty.
---
## Action System
The `Action` interface has a single method:
```go
type Action interface {
Execute(ctx context.Context, wv *Webview) error
}
```
Concrete action types cover: `Click`, `Type`, `Navigate`, `Wait`, `WaitForSelector`, `Scroll`, `ScrollIntoView`, `Focus`, `Blur`, `Clear`, `Select`, `Check`, `Hover`, `DoubleClick`, `RightClick`, `PressKey`, `SetAttribute`, `RemoveAttribute`, `SetValue`.
`ActionSequence` provides a fluent builder:
```go
err := NewActionSequence().
Navigate("https://example.com").
WaitForSelector("#login-form").
Type("#email", "user@example.com").
Type("#password", "secret").
Click("#submit").
Execute(ctx, wv)
```
`Execute` runs actions sequentially and returns the index and error of the first failure.
### File Upload and Drag-and-Drop
`UploadFile(selector, filePaths)` uses `DOM.setFileInputFiles` on the node ID of the resolved file input element.
`DragAndDrop(sourceSelector, targetSelector)` dispatches `mousePressed`, `mouseMoved`, and `mouseReleased` events between the centre points of the two elements.
---
## Angular Helpers
`AngularHelper` (constructed via `NewAngularHelper(wv)`) provides SPA-specific utilities. All methods accept the `AngularHelper.timeout` deadline (default 30 s).
### Application Detection
`isAngularApp` checks for Angular 2+ via `window.getAllAngularRootElements`, the `[ng-version]` attribute, or `window.ng.probe`. It also checks for AngularJS 1.x via `window.angular.element`.
### Zone.js Stability
`WaitForAngular()` waits for Zone.js to report stability by checking `zone.isStable` and subscribing to `zone.onStable`. If the injector-based approach fails (production builds without debug info), it falls back to polling `window.Zone.current._inner._hasPendingMicrotasks` and `_hasPendingMacrotasks` at 50 ms intervals.
### Router Integration
`NavigateByRouter(path)` obtains the `Router` service from the Angular injector and calls `router.navigateByUrl(path)`, then waits for Zone.js stability.
`GetRouterState()` returns an `AngularRouterState` with the current URL, fragment, route params, and query params.
### Component Introspection
`GetComponentProperty(selector, property)` and `SetComponentProperty(selector, property, value)` access component instances via `window.ng.probe(element).componentInstance`. After setting a property, `ApplicationRef.tick()` is called to trigger change detection.
`CallComponentMethod(selector, method, args...)` invokes a method on the component instance and triggers change detection.
`GetService(name)` retrieves a named service from the root injector and returns a JSON-serialisable representation.
### ngModel
`GetNgModel(selector)` reads the current value of an ngModel-bound input. `SetNgModel(selector, value)` writes the value, fires `input` and `change` events, and triggers `ApplicationRef.tick()`.
---
## Multi-Tab Support
`CDPClient.NewTab(url)` calls `GET {debugURL}/json/new?{url}` and returns a new `CDPClient` connected to the WebSocket of the newly created tab. Each tab has its own independent read loop and event handler registry, so console events and other notifications are tab-scoped.
`CDPClient.CloseTab()` calls `Browser.close` on the tab's CDP session.
`ListTargets(debugURL)` and `GetVersion(debugURL)` are package-level utilities that query the HTTP endpoint without requiring an active WebSocket connection.
---
## Emulation
`SetViewport(width, height int)` calls `Emulation.setDeviceMetricsOverride` with `deviceScaleFactor: 1` and `mobile: false`.
`SetUserAgent(ua string)` calls `Emulation.setUserAgentOverride`.
---
## Thread Safety
- `CDPClient` uses `sync.RWMutex` for WebSocket writes and `sync.Mutex` for the pending-response map. Event handler registration uses a separate `sync.RWMutex`.
- `Webview` uses `sync.RWMutex` for its console log slice.
- `ConsoleWatcher` and `ExceptionWatcher` use `sync.RWMutex` for their message and handler slices. Handlers are copied before being called so they execute outside the write lock.