go/pkg/rag/chunk_test.go
Snider 989b7e1e65
feat: wire release command, add tar.xz support, unified installers (#277)
* feat(cli): wire release command and add installer scripts

- Wire up `core build release` subcommand (was orphaned)
- Wire up `core monitor` command (missing import in full variant)
- Add installer scripts for Unix (.sh) and Windows (.bat)
  - setup: Interactive with variant selection
  - ci: Minimal for CI/CD environments
  - dev: Full development variant
  - go/php/agent: Targeted development variants
- All scripts include security hardening:
  - Secure temp directories (mktemp -d)
  - Architecture validation
  - Version validation after GitHub API call
  - Proper cleanup on exit
  - PowerShell PATH updates on Windows (avoids setx truncation)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(build): add tar.xz support and unified installer scripts

- Add tar.xz archive support using Borg's compress package
  - ArchiveXZ() and ArchiveWithFormat() for configurable compression
  - Better compression ratio than gzip for release artifacts
- Consolidate 12 installer scripts into 2 unified scripts
  - install.sh and install.bat with BunnyCDN edge variable support
  - Subdomains: setup.core.help, ci.core.help, dev.core.help, etc.
  - MODE and VARIANT transformed at edge based on subdomain
- Installers prefer tar.xz with automatic fallback to tar.gz
- Fixed CodeRabbit issues: HTTP status patterns, tar error handling,
  verify_install params, VARIANT validation, CI PATH persistence

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: add build and release config files

- .core/build.yaml - cross-platform build configuration
- .core/release.yaml - release workflow configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: move plans from docs/ to tasks/

Consolidate planning documents in tasks/plans/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(install): address CodeRabbit review feedback

- Add curl timeout (--max-time) to prevent hanging on slow networks
- Rename TMPDIR to WORK_DIR to avoid clobbering system env var
- Add chmod +x to ensure binary has execute permissions
- Add error propagation after subroutine calls in batch file
- Remove System32 install attempt in CI mode (use consistent INSTALL_DIR)
- Fix HTTP status regex for HTTP/2 compatibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(rag): add Go RAG implementation with Qdrant + Ollama

Add RAG (Retrieval Augmented Generation) tools for storing documentation
in Qdrant vector database and querying with semantic search. This replaces
the Python tools/rag implementation with a native Go solution.

New commands:
- core rag ingest [directory] - Ingest markdown files into Qdrant
- core rag query [question] - Query vector database with semantic search
- core rag collections - List and manage Qdrant collections

Features:
- Markdown chunking by sections and paragraphs with overlap
- UTF-8 safe text handling for international content
- Automatic category detection from file paths
- Multiple output formats: text, JSON, LLM context injection
- Environment variable support for host configuration

Dependencies:
- github.com/qdrant/go-client (gRPC client)
- github.com/ollama/ollama/api (embeddings API)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(deploy): add pure-Go Ansible executor and Coolify API integration

Implement infrastructure deployment system with:

- pkg/ansible: Pure Go Ansible executor
  - Playbook/inventory parsing (types.go, parser.go)
  - Full execution engine with variable templating, loops, blocks,
    conditionals, handlers, and fact gathering (executor.go)
  - SSH client with key/password auth and privilege escalation (ssh.go)
  - 35+ module implementations: shell, command, copy, template, file,
    apt, service, systemd, user, group, git, docker_compose, etc. (modules.go)

- pkg/deploy/coolify: Coolify API client wrapping Python swagger client
  - List/get servers, projects, applications, databases, services
  - Generic Call() for any OpenAPI operation

- pkg/deploy/python: Embedded Python runtime for swagger client integration

- internal/cmd/deploy: CLI commands
  - core deploy servers/projects/apps/databases/services/team
  - core deploy call <operation> [params-json]

This enables Docker-free infrastructure deployment with Ansible-compatible
playbooks executed natively in Go.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(deploy): address linter warnings and build errors

- Fix fmt.Sprintf format verb error in ssh.go (remove unused stat command)
- Fix errcheck warnings by explicitly ignoring best-effort operations
- Fix ineffassign warning in cmd_ansible.go

All golangci-lint checks now pass for deploy packages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* style(deploy): fix gofmt formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(deploy): use known_hosts for SSH host key verification

Address CodeQL security alert by using the user's known_hosts file
for SSH host key verification when available. Falls back to accepting
any key only when known_hosts doesn't exist (common in containerized
or ephemeral environments).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(ai,security,ide): add agentic MVP, security jobs, and Core IDE desktop app

Wire up AI infrastructure with unified pkg/ai package (metrics JSONL,
RAG integration), move RAG under `core ai rag`, add `core ai metrics`
command, and enrich task context with Qdrant documentation.

Add `--target` flag to all security commands for external repo scanning,
`core security jobs` for distributing findings as GitHub Issues, and
consistent error logging across scan/deps/alerts/secrets commands.

Add Core IDE Wails v3 desktop app with Angular 20 frontend, MCP bridge
(loopback-only HTTP server), WebSocket hub, and Claude Code bridge.
Production-ready with Lethean CIC branding, macOS code signing support,
and security hardening (origin validation, body size limits, URL scheme
checks, memory leak prevention, XSS mitigation).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address PR review comments from CodeRabbit, Copilot, and Gemini

Fixes across 25 files addressing 46+ review comments:

- pkg/ai/metrics.go: handle error from Close() on writable file handle
- pkg/ansible: restore loop vars after loop, restore become settings,
  fix Upload with become=true and no password (use sudo -n), honour
  SSH timeout config, use E() helper for contextual errors, quote git
  refs in checkout commands
- pkg/rag: validate chunk config, guard negative-to-uint64 conversion,
  use E() helper for errors, add context timeout to Ollama HTTP calls
- pkg/deploy/python: fix exec.ExitError type assertion (was os.PathError),
  handle os.UserHomeDir() error
- pkg/build/buildcmd: use cmd.Context() instead of context.Background()
  for proper Ctrl+C cancellation
- install.bat: add curl timeouts, CRLF line endings, use --connect-timeout
  for archive downloads
- install.sh: use absolute path for version check in CI mode
- tools/rag: fix broken ingest.py function def, escape HTML in query.py,
  pin qdrant-client version, add markdown code block languages
- internal/cmd/rag: add chunk size validation, env override handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(build): make release dry-run by default and remove darwin/amd64 target

Replace --dry-run (default false) with --we-are-go-for-launch (default
false) so `core build release` is safe by default. Remove darwin/amd64
from default build targets (arm64 only for macOS). Fix cmd_project.go
to use command context instead of context.Background().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:49:57 +00:00

120 lines
2.8 KiB
Go

package rag
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestChunkMarkdown_Good_SmallSection(t *testing.T) {
text := `# Title
This is a small section that fits in one chunk.
`
chunks := ChunkMarkdown(text, DefaultChunkConfig())
assert.Len(t, chunks, 1)
assert.Contains(t, chunks[0].Text, "small section")
}
func TestChunkMarkdown_Good_MultipleSections(t *testing.T) {
text := `# Main Title
Introduction paragraph.
## Section One
Content for section one.
## Section Two
Content for section two.
`
chunks := ChunkMarkdown(text, DefaultChunkConfig())
assert.GreaterOrEqual(t, len(chunks), 2)
}
func TestChunkMarkdown_Good_LargeSection(t *testing.T) {
// Create a section larger than chunk size
text := `## Large Section
` + repeatString("This is a test paragraph with some content. ", 50)
cfg := ChunkConfig{Size: 200, Overlap: 20}
chunks := ChunkMarkdown(text, cfg)
assert.Greater(t, len(chunks), 1)
for _, chunk := range chunks {
assert.NotEmpty(t, chunk.Text)
assert.Equal(t, "Large Section", chunk.Section)
}
}
func TestChunkMarkdown_Good_ExtractsTitle(t *testing.T) {
text := `## My Section Title
Some content here.
`
chunks := ChunkMarkdown(text, DefaultChunkConfig())
assert.Len(t, chunks, 1)
assert.Equal(t, "My Section Title", chunks[0].Section)
}
func TestCategory_Good_UIComponent(t *testing.T) {
tests := []struct {
path string
expected string
}{
{"docs/flux/button.md", "ui-component"},
{"ui/components/modal.md", "ui-component"},
{"brand/vi-personality.md", "brand"},
{"mascot/expressions.md", "brand"},
{"product-brief.md", "product-brief"},
{"tasks/2024-01-15-feature.md", "task"},
{"plans/architecture.md", "task"},
{"architecture/migration.md", "architecture"},
{"docs/api.md", "documentation"},
}
for _, tc := range tests {
t.Run(tc.path, func(t *testing.T) {
assert.Equal(t, tc.expected, Category(tc.path))
})
}
}
func TestChunkID_Good_Deterministic(t *testing.T) {
id1 := ChunkID("test.md", 0, "hello world")
id2 := ChunkID("test.md", 0, "hello world")
assert.Equal(t, id1, id2)
}
func TestChunkID_Good_DifferentForDifferentInputs(t *testing.T) {
id1 := ChunkID("test.md", 0, "hello world")
id2 := ChunkID("test.md", 1, "hello world")
id3 := ChunkID("other.md", 0, "hello world")
assert.NotEqual(t, id1, id2)
assert.NotEqual(t, id1, id3)
}
func TestShouldProcess_Good_MarkdownFiles(t *testing.T) {
assert.True(t, ShouldProcess("doc.md"))
assert.True(t, ShouldProcess("doc.markdown"))
assert.True(t, ShouldProcess("doc.txt"))
assert.False(t, ShouldProcess("doc.go"))
assert.False(t, ShouldProcess("doc.py"))
assert.False(t, ShouldProcess("doc"))
}
// Helper function
func repeatString(s string, n int) string {
result := ""
for i := 0; i < n; i++ {
result += s
}
return result
}