Commit graph

7 commits

Author SHA1 Message Date
Snider
fc758a832b feat(search): add English stemmer for improved search recall
Implement lightweight Porter-style suffix stripping in stemmer.go covering
plurals (-sses, -ies, -s), verb forms (-ed, -ing, -eed), and derivational
suffixes (-ational, -tional, -fulness, -ness, -ment, -ation, -ously,
-ively, -ably, -ally, -izer, -ingly). Words under 4 chars are unchanged
and results are guaranteed at least 2 chars.

tokenize() now emits both raw and stemmed forms so the index contains both.
Search() distinguishes stem-only matches (scoreStemWord=0.7) from exact
matches (1.0), keeping stemmed results slightly below raw hits.

22 stem unit tests, 5 search integration tests, and BenchmarkStem with
100 words. All existing tests pass with no regressions.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:07:50 +00:00
Snider
2cca8d5656 docs: flesh out English stemming task spec for Phase 1 completion
Detail Porter-style stemmer algorithm, tokenize() integration, search
scoring (scoreStemWord = 0.7), and comprehensive test matrix including
regression verification.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 07:56:33 +00:00
Snider
23cef8592a test: complete Phase 0 hardening — 124 tests, 94% coverage, 8 benchmarks
Expand parser tests: empty input, frontmatter-only, malformed YAML,
deeply nested headings (H4-H6), Unicode (CJK, emoji, diacritics,
mixed scripts), very long documents (10K+ lines), edge cases.

Expand search tests: empty/invalid queries, no results, case sensitivity,
multi-word queries, special characters (@, dots, underscores), overlapping
matches, scoring boundaries (title vs body), tag matching, section title
boost, tokenize/highlight edge cases, catalog integration.

Add search benchmarks: single word, multi-word, no results, partial match,
500-topic catalog, 1000-topic catalog, Add indexing, tokenize. Uses
b.Loop() (Go 1.25+) and b.ReportAllocs().

Coverage: 92.1% → 94.0% | Tests: 39 → 124 | go vet: clean | race: clean

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 04:43:48 +00:00
Claude
3e91510bcf
feat(search): add fuzzy matching, phrase search, and improved scoring
Phase 0: Push test coverage from 92.1% to 100% by adding catalog_test.go
and targeted tests for all uncovered branches in search.go. Add
BenchmarkSearch with 150 topics (~745us/op baseline).

Phase 1: Implement three search improvements:
- Levenshtein-based fuzzy matching (max distance 2, words >= 3 chars)
- Quoted phrase search via extractPhrases() with +8.0 boost
- Tag boost (+3.0) and multi-word bonus (+2.0) scoring
- Named scoring constants replacing magic numbers

All changes are backward-compatible; Search() signature unchanged.

Co-Authored-By: Charon <developers@lethean.io>
2026-02-20 01:21:35 +00:00
Snider
b6da16c717 docs: enrich TODO.md with Phase 0 hardening tasks
Add Phase 0 before existing phases: expand parser/search tests,
add benchmarks, vet clean. Specific test cases listed.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 00:28:43 +00:00
Virgil
2c4afaef7a docs: add TODO.md and FINDINGS.md for fleet delegation
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:36:09 +00:00
Snider
ad5e70937b feat: extract go-help from core/go pkg/help
YAML-based help catalog with topic search.
Single external dependency: gopkg.in/yaml.v3
Module: forge.lthn.ai/core/go-help

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 16:09:34 +00:00