core/go-ml

Table of Contents

Scoring Engine

Architecture
Scoring Suites

Heuristic (heuristic.go)
Judge (judge.go)
Exact Match (exact.go)
Probes (probes.go)

Concurrent Execution

Scoring Engine

Architecture

The scoring engine evaluates model responses across multiple suites:

engine := ml.NewEngine(judge, workers, suites)
results := engine.Score(ctx, responses)

Scoring Suites

Heuristic (`heuristic.go`)

Fast rule-based scoring that requires no LLM:

Response length and structure
Repetition detection
Format compliance
Keyword presence

Judge (`judge.go`)

LLM-as-judge evaluation using a secondary model:

Sends response + rubric to judge model
Parses structured score + reasoning
Supports configurable rubrics per task type

Exact Match (`exact.go`)

Deterministic scoring for tasks with known answers (e.g., GSM8K):

Exact string matching
Normalised comparison (whitespace, case)
Numeric extraction and comparison

Probes (`probes.go`)

Ethics-aware evaluation probes:

Safety and harm detection
Bias assessment
Instruction following compliance
Output format validation

Concurrent Execution

The worker.go pool runs scoring suites in parallel across multiple responses. Configurable worker count balances throughput with resource usage.