From 0cf35221e6fb18b99199843a755f5bf5ac76fcfe Mon Sep 17 00:00:00 2001
From: Snider <snider@host.uk.com>
Date: Mon, 23 Feb 2026 12:34:32 +0000
Subject: [PATCH] docs: document InspectAttention pass-through on
 InferenceAdapter

Co-Authored-By: Virgil <virgil@lethean.io>
---
 docs/architecture.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/architecture.md b/docs/architecture.md
index 5a6ef78..552c118 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -132,6 +132,7 @@ Key behaviours:
 - `GenerateStream` and `ChatStream` forward each token's text to the provided `TokenCallback`. If the callback returns an error, iteration stops.
 - `Available()` always returns `true` — the model is already loaded when the adapter is constructed.
 - `Close()` delegates to `TextModel.Close()`, releasing GPU memory.
+- `InspectAttention()` delegates to the underlying `TextModel` via type assertion to `inference.AttentionInspector`. Returns an error if the backend doesn't support attention inspection. This enables LEM's Q/K Bone Orientation analysis through the adapter without consumers needing to unwrap the underlying model.
 
 ### MLX Backend (`backend_mlx.go`, darwin/arm64 only)