Expose model metadata: architecture, vocab size, layer count, hidden
dimension, quantisation bits and group size.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expose prefill/decode timing, token counts, throughput, and GPU memory
stats from the last inference operation. Same retrieval pattern as Err().
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ClassifyResult, BatchResult types and Classify/BatchGenerate methods
to TextModel for batched prefill-only and autoregressive inference.
Add WithLogits option for returning raw vocab logits.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>