go-inference

6 commits 1 branch 4 tags 146 KiB

Author	SHA1	Message	Date
Snider	28f444ced4	feat: add ModelInfo type and Info() to TextModel Expose model metadata: architecture, vocab size, layer count, hidden dimension, quantisation bits and group size. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 23:36:16 +00:00
Snider	df176765e7	feat: add GenerateMetrics type and Metrics() to TextModel Expose prefill/decode timing, token counts, throughput, and GPU memory stats from the last inference operation. Same retrieval pattern as Err(). Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 23:34:31 +00:00
Snider	2517b774b8	feat: add batch inference API (Classify, BatchGenerate) Add ClassifyResult, BatchResult types and Classify/BatchGenerate methods to TextModel for batched prefill-only and autoregressive inference. Add WithLogits option for returning raw vocab logits. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 23:29:28 +00:00
Claude	3719734f56	feat: add ParallelSlots to LoadConfig for concurrent inference Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 23:12:29 +00:00
Snider	07cd917259	feat: define shared TextModel, Backend, Token, Message interfaces Zero-dependency interface package for the Core inference ecosystem. Backends (go-mlx, go-rocm) implement these interfaces. Consumers (go-ml, go-ai, go-i18n) import them. Includes: - TextModel: Generate, Chat, Err, Close (with context.Context) - Backend: Named engine registry with platform preference - Functional options: WithMaxTokens, WithTemperature, WithTopK, etc. - LoadModel: Auto-selects best available backend Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 19:37:27 +00:00
Virgil	fca0ed8e16	Initial commit	2026-02-19 19:35:54 +00:00