Snider c570f08eba docs: graduate TODO/FINDINGS into production documentation

Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>

2026-02-20 15:01:55 +00:00

11 KiB

Raw Permalink Blame History

Project History — go-store

Origin

Extracted from forge.lthn.ai/core/go (pkg/store/) on 19 February 2026 by Virgil. The extraction gave the package its own module path (forge.lthn.ai/core/go-store), its own repository, and independent versioning.

At extraction the package comprised a single source file and a single test file. It provided basic CRUD with group namespacing and template rendering but had no TTL, no namespace isolation, and no event system. Test coverage was 73.1%.

Phase 0 — Hardening and Test Coverage

Agent: Charon Completed: 2026-02-20

Concurrency fix: SQLITE_BUSY under contention

Problem. The database/sql connection pool hands out different physical connections for each Exec or Query call. SQLite pragmas (PRAGMA journal_mode=WAL, PRAGMA busy_timeout) are per-connection. Under concurrent write load (10 goroutines, 100 ops each), connections from the pool that had not received the WAL pragma would block and return SQLITE_BUSY immediately rather than waiting.

Fix. db.SetMaxOpenConns(1) serialises all database access through a single connection. Because SQLite is a single-writer database by design (it serialises writes at the file-lock level regardless of pool size), this does not reduce write throughput. It eliminates the BUSY errors by ensuring the pragma settings always apply.

Defence in depth. PRAGMA busy_timeout=5000 was added to make the single connection wait up to 5 seconds before reporting a timeout error, providing additional resilience.

Extended test coverage

Added tests for:

Concurrent read/write with 10 goroutines and the race detector
Render() with invalid template syntax (parse error)
Render() with template execution error (calling a string as a function)
Get() on a non-existent group vs. a non-existent key
DeleteGroup() followed by GetAll() returning empty
Count() after 500 bulk inserts
In-memory vs. file-backed store (persistence across open/close)
WAL mode verification via PRAGMA journal_mode query
Edge cases: empty key, empty value, empty group, 10K-character key, binary-like values with null bytes, Unicode (accented, CJK, Arabic), SQL injection attempts, special characters

Coverage: 73.1% to 90.9%.

The remaining 9.1% comprised defensive error paths in New(), GetAll(), and Render() that are unreachable through integration tests against a healthy SQLite database (driver initialisation failures, scan errors on NULL columns, rows iteration errors on corrupted pages).

Benchmarks

Established baseline benchmark results:

BenchmarkSet-32               119280    10290 ns/op
BenchmarkGet-32               335707     3589 ns/op
BenchmarkGetAll-32 (10K keys)    258  4741451 ns/op
BenchmarkSet_FileBacked-32      4525   265868 ns/op

go vet ./... made clean. No warnings.

Phase 1 — TTL Support

Agent: Charon Completed: 2026-02-20

Added optional time-to-live for keys.

Changes

expires_at INTEGER nullable column added to the kv schema.
SetWithTTL(group, key, value string, ttl time.Duration) stores the current time plus TTL as a Unix millisecond timestamp in expires_at.
Get() performs lazy deletion: if a key is found with an expires_at in the past, it is deleted and ErrNotFound is returned.
Count(), GetAll(), and Render() include (expires_at IS NULL OR expires_at > ?) in all queries, excluding expired keys from results.
PurgeExpired() public method deletes all physically stored expired rows and returns the count removed.
Background goroutine calls PurgeExpired() every 60 seconds, controlled by a context.WithCancel that is cancelled on Close().
Set() clears any existing TTL when overwriting a key (sets expires_at = NULL).
Schema migration: ALTER TABLE kv ADD COLUMN expires_at INTEGER runs on New(). The "duplicate column" error on already-upgraded databases is silently ignored.

Tests added

TTL functionality tests covering: normal expiry on Get, exclusion from Count/GetAll/Render, SetWithTTL upsert, plain Set clearing TTL, future TTL remaining accessible, background purge goroutine, concurrent TTL access with 10 goroutines, schema migration from a pre-TTL database (manually constructed without expires_at).

Coverage: 90.9% to 94.7%.

Phase 2 — Namespace Isolation

Agent: Charon Completed: 2026-02-20

Added ScopedStore for multi-tenant namespace isolation.

Changes

scope.go introduced with ScopedStore wrapping *Store.
Namespace strings validated against ^[a-zA-Z0-9-]+$.
NewScoped(store, namespace) constructor.
All Store methods delegated with group automatically prefixed as namespace + ":" + group.
QuotaConfig{MaxKeys, MaxGroups int} struct; zero means unlimited.
NewScopedWithQuota(store, namespace, quota) constructor.
ErrQuotaExceeded sentinel error.
checkQuota(group, key) internal method: skips upserts (existing key), checks CountAll(namespace+":") against MaxKeys, checks Groups(namespace+":") against MaxGroups only when the group is new.
CountAll(prefix string) added to Store: counts non-expired keys across all groups matching a prefix. Empty prefix counts across all groups.
Groups(prefix string) added to Store: returns distinct non-expired group names matching a prefix. Empty prefix returns all groups.

Tests added

ScopedStore basic CRUD, cross-namespace isolation, GetAll scoping. Quota tests: MaxKeys limit, upsert does not count, delete and re-insert stays within quota, MaxGroups limit, zero quota is unlimited, TTL-expired keys do not count towards quota. CountAll and Groups tests with mixed namespaces and expired key exclusion.

Coverage: 94.7% to 95.5% (approximate; coverage_test.go added to cover defensive paths).

Phase 3 — Event Hooks

Agent: Charon Completed: 2026-02-20

Added a reactive notification system for store mutations.

Changes

events.go introduced with EventType (EventSet, EventDelete, EventDeleteGroup), Event struct, Watcher struct, callbackEntry struct.
watcherBufSize = 16 constant.
Watch(group, key string) *Watcher: creates a buffered channel watcher. Wildcard "*" supported for both group and key. Uses atomic.AddUint64 for monotonic watcher IDs.
Unwatch(w *Watcher): removes watcher from the registry and closes its channel. Idempotent.
OnChange(fn func(Event)) func(): registers a synchronous callback. Returns an idempotent unregister function using sync.Once.
notify(e Event): internal dispatch. Acquires read-lock on s.mu; non-blocking send to each matching watcher channel (drop-on-full); calls each callback synchronously. Separate watcherMatches helper handles wildcard logic.
Set(), SetWithTTL(), Delete(), DeleteGroup() each call notify() after the successful database write.
Store struct extended with watchers []*Watcher, callbacks []callbackEntry, mu sync.RWMutex, nextID uint64.
ScopedStore mutations automatically emit events with the full prefixed group name — no extra implementation required.

Tests added

Specific-key watcher receives matching events and ignores non-matching keys. Wildcard-key watcher receives all keys in a group. Global wildcard ("*", "*") receives all mutations across all groups. Unwatch stops delivery and closes the channel. Unwatch is idempotent. Delete and DeleteGroup emit correct event types with correct populated fields. OnChange callback fires on Set and Delete. OnChange unregister stops future invocations (idempotent). Buffer-full (32 writes against cap-16 channel) does not block the writer. Multiple watchers on the same key receive events independently. Concurrent Watch/Unwatch during concurrent writes (race test, 10 goroutines). ScopedStore events carry the prefixed group name. SetWithTTL emits EventSet. EventType.String() returns correct labels including "unknown" for undefined values.

Coverage: 94.7% to 95.5%.

Coverage Test Suite

coverage_test.go exercises defensive error paths that integration tests cannot reach through normal usage:

Schema conflict: pre-existing SQLite index named kv causes New() to return store.New: schema: ....
GetAll scan error: NULL key in a row (requires manually altering the schema to remove the NOT NULL constraint).
GetAll rows iteration error: physically corrupting database pages mid-file to trigger rows.Err() during multi-page scans.
Render scan error: same NULL-key technique.
Render rows iteration error: same corruption technique.

These tests exercise correct defensive code. They must continue to pass but are not indicative of real failure modes in production.

Known Limitations

Single writer. SetMaxOpenConns(1) serialises all access through one connection. Write throughput is bounded by SQLite's single-writer architecture. This is appropriate for the intended use cases (configuration storage, session state, per-tenant key-value data) but is not suitable for high-throughput append-only workloads.

File-backed write throughput. File-backed Set operations (~3,800 ops/sec on Apple M-series) are dominated by fsync. Applications writing at higher rates should use in-memory stores or consider WAL checkpoint tuning.

GetAll memory usage. Fetching a group with 10,000 keys allocates approximately 2.3 MB per call. There is no pagination API. Applications with very large groups should restructure data into smaller groups or query selectively.

No cross-group transactions. There is no API for atomic multi-group operations. Each method is individually atomic at the SQLite level, but there is no Begin/Commit exposed to callers.

No wildcard deletes. There is no DeletePrefix or pattern-based delete. To delete all groups under a namespace, callers must retrieve the group list via Groups() and delete each individually.

Callback deadlock risk. OnChange callbacks run synchronously in the writer's goroutine while holding s.mu (read). Calling any Store method that calls notify() from within a callback will attempt to re-acquire s.mu (read), which is permitted with a read-lock but calling Watch/Unwatch/OnChange within a callback will deadlock (they require a write-lock). Document this constraint prominently in callback usage.

No persistence of watcher registrations. Watchers and callbacks are in-memory only. They are not persisted across Close/New cycles.

Future Considerations

These are design notes, not committed work:

Pagination for GetAll. A GetPage(group string, offset, limit int) method would support large groups without full in-memory materialisation.
Indexed prefix keys. An additional index on (grp, key) prefix would accelerate prefix scans without a full-table scan.
TTL background purge interval as constructor option. Currently only settable by mutating s.purgeInterval directly in tests. A WithPurgeInterval(d time.Duration) functional option would make this part of the public API.
Cross-group atomic operations. Exposing a Transaction(func(tx *StoreTx) error) API would allow callers to compose atomic multi-group operations.
DeletePrefix(prefix string) method. Would enable efficient cleanup of an entire namespace without first listing groups.

11 KiB Raw Permalink Blame History

Project History — go-store

Origin

Phase 0 — Hardening and Test Coverage

Concurrency fix: SQLITE_BUSY under contention

Extended test coverage

Benchmarks

Phase 1 — TTL Support

Changes

Tests added

Phase 2 — Namespace Isolation

Changes

Tests added

Phase 3 — Event Hooks

Changes

Tests added

Coverage Test Suite

Known Limitations

Future Considerations

11 KiB

Raw Permalink Blame History