diff --git a/FINDINGS.md b/FINDINGS.md
index 602ceb1..002b673 100644
--- a/FINDINGS.md
+++ b/FINDINGS.md
@@ -228,3 +228,25 @@ Design and implementation plan reviewed. The layered architecture (internal/llam
    ```
 
 3. **`//go:build rocm` for integration tests** — Good call. Keeps `go test ./...` fast on machines without GPU.
+
+---
+
+## 2026-02-19: Phase 2 Robustness (Charon)
+
+### Concurrent Requests
+
+Tested 3 goroutines calling Generate() simultaneously on the same model (Gemma3-1B, llama-server with default settings). All 3 received output (~0.9s total). llama-server handles concurrency via its slot system — default is 1 slot, so requests are serialised server-side.
+
+For true parallel inference, use `--parallel N` flag in llama-server (not yet configurable via go-rocm). VRAM cost scales with number of slots and context size.
+
+### VRAM Monitoring
+
+Reading sysfs directly (`/sys/class/drm/cardN/device/mem_info_vram_*`) instead of spawning `rocm-smi`. Auto-detects dGPU by selecting the card with the largest VRAM total:
+- card0 = iGPU (2GB) — Ryzen 9 9950X integrated
+- card1 = dGPU (16GB) — RX 7800 XT
+
+Note: sysfs reads are non-atomic. Total and Used are read separately, so transient inconsistencies are possible under heavy allocation churn. Free is clamped to prevent uint64 underflow.
+
+### lastErr Design Limitation
+
+`rocmModel.lastErr` is a single mutex-protected field shared across all callers. With concurrent Generate/Chat calls, errors can be clobbered (last writer wins). `Err()` is only reliable in single-caller scenarios. This matches the go-inference interface contract (single `Err() error` method), so it's a known limitation, not a bug. Per-call error returns would require an interface change in go-inference.
diff --git a/TODO.md b/TODO.md
index 5275df0..5dd8e1b 100644
--- a/TODO.md
+++ b/TODO.md
@@ -25,11 +25,11 @@ The Ryzen 9 9950X iGPU shows up as ROCm Device 1, reports 100GB free (system RAM
 
 ## Phase 2: Robustness
 
-- [ ] **Server crash recovery** — If llama-server dies mid-generation, detect via process exit, return error via Err(), allow re-load.
-- [ ] **Port conflict handling** — If the random port is taken, retry with a different port.
-- [ ] **Graceful shutdown** — On context cancellation, stop the current request cleanly (close SSE stream), don't kill the server. Only Close() kills the server.
-- [ ] **Memory monitoring** — Use `rocm-smi --showmeminfo vram` or HIP API to report VRAM usage. Expose via package-level functions (like go-mlx's GetActiveMemory).
-- [ ] **Concurrent requests** — llama-server supports concurrent slots. Test with multiple goroutines calling Generate() simultaneously. Document max concurrency.
+- [x] **Server crash recovery** — `server.alive()` detects process exit; Generate/Chat return error immediately if dead. Commits `2c4966e`, `c07f37a`. (Charon, 19 Feb 2026)
+- [x] **Port conflict handling** — `startServer()` retries up to 3 times with new port on process exit. Only retries on exit, not timeout. Commits `c50a8e9`, `b7342ec`. (Charon, 19 Feb 2026)
+- [x] **Graceful shutdown** — Already worked in Phase 1. Integration test confirms server survives context cancellation and generates again. Commit `a6e647c`. (Charon, 19 Feb 2026)
+- [x] **Memory monitoring** — `GetVRAMInfo()` reads sysfs, auto-detects dGPU by largest VRAM. Uint64 underflow guard on Free. Commits `501de83`, `954c570`. (Charon, 19 Feb 2026)
+- [x] **Concurrent requests** — 3 goroutines calling Generate() simultaneously all get output. llama-server serialises via 1 slot (default). Commit `a6e647c`. (Charon, 19 Feb 2026)
 
 ## Phase 3: Model Support