Commit graph

2 commits

Author SHA1 Message Date
Snider
62c5d65374 feat: modernise to Go 1.26 iterators and stdlib helpers
All checks were successful
Security Scan / security (push) Successful in 14s
Test / test (push) Successful in 1m44s
Add Catalog.All(), Catalog.SearchResults(), AllSections(), Tokens()
iterators. Use slices.SortFunc, slices.Clone, maps.Values, built-in
min replacing min3, strings.SplitSeq for streaming, range-over-int
in levenshtein and benchmarks.

Co-Authored-By: Gemini <noreply@google.com>
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 05:32:21 +00:00
Snider
fc758a832b feat(search): add English stemmer for improved search recall
Implement lightweight Porter-style suffix stripping in stemmer.go covering
plurals (-sses, -ies, -s), verb forms (-ed, -ing, -eed), and derivational
suffixes (-ational, -tional, -fulness, -ness, -ment, -ation, -ously,
-ively, -ably, -ally, -izer, -ingly). Words under 4 chars are unchanged
and results are guaranteed at least 2 chars.

tokenize() now emits both raw and stemmed forms so the index contains both.
Search() distinguishes stem-only matches (scoreStemWord=0.7) from exact
matches (1.0), keeping stemmed results slightly below raw hits.

22 stem unit tests, 5 search integration tests, and BenchmarkStem with
100 words. All existing tests pass with no regressions.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:07:50 +00:00