[agent/claude:haiku] Review all files in pkg/core/. Report findings with file:lin... #19
1 changed files with 370 additions and 0 deletions
370
pkg/core/REVIEW.md
Normal file
370
pkg/core/REVIEW.md
Normal file
|
|
@ -0,0 +1,370 @@
|
|||
# Code Review: pkg/core/ — Comprehensive Analysis
|
||||
|
||||
**Repository:** core/go
|
||||
**Status:** Ready for fixes
|
||||
**Breaking Change Risk:** 14 consumers
|
||||
**Date:** 2026-03-18
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Comprehensive review of `pkg/core/` identified **3 critical**, **4 high**, **3 medium**, and **2 low** severity findings. The most severe issues are:
|
||||
|
||||
1. CLI subsystem is completely non-functional (nil pointers)
|
||||
2. Race condition in asset retrieval
|
||||
3. Package-level shared mutex state violates Core isolation
|
||||
4. Lock ordering hazard between service and i18n subsystems
|
||||
5. Data race in default logger replacement
|
||||
|
||||
All findings have concrete impact on the 14 consuming modules. Priority fixes are listed at the end.
|
||||
|
||||
---
|
||||
|
||||
## Critical Findings
|
||||
|
||||
### [CRITICAL] embed.go:69–79 — Concurrent Map Access in GetAsset
|
||||
|
||||
**Severity:** Crash (panic)
|
||||
**File:** pkg/core/embed.go
|
||||
**Lines:** 69–79
|
||||
**Confidence:** 92%
|
||||
|
||||
The `GetAsset` method releases the `assetGroupsMu` RLock before reading from `g.assets`, creating a race condition with concurrent `AddAsset` calls.
|
||||
|
||||
```go
|
||||
assetGroupsMu.RLock()
|
||||
g, ok := assetGroups[group]
|
||||
assetGroupsMu.RUnlock() // lock released here
|
||||
// ...
|
||||
data, ok := g.assets[name] // map read after lock released — RACE
|
||||
```
|
||||
|
||||
**Impact:** Any concurrent call to `AddAsset` on the same group can mutate `g.assets` while `GetAsset` reads it, triggering `fatal error: concurrent map read and map write`.
|
||||
|
||||
**Fix:** Hold `assetGroupsMu.RLock()` through both the group lookup and the asset read, or make a copy of the asset while locked.
|
||||
|
||||
---
|
||||
|
||||
### [CRITICAL] cli.go:* — Uninitialised CLI Subsystem (Nil Pointer Dereference)
|
||||
|
||||
**Severity:** Crash (panic)
|
||||
**File:** pkg/core/cli.go
|
||||
**Lines:** 94, 99, 105, 118, 137, 142, 157, 163, 169, 175, 181, 187, 193, 198
|
||||
**Confidence:** 95%
|
||||
|
||||
The `Core` constructor at contract.go:96 initialises `cli: &Cli{opts: &CliOpts{}}` with **no `rootCommand`** and **no `bannerFunction`**. Every public `Cli` method that accesses these fields panics:
|
||||
|
||||
- `AddCommand` (cli.go:94) — dereferences `c.rootCommand`
|
||||
- `PrintBanner` (cli.go:99) — dereferences `c.bannerFunction`
|
||||
- `Run` (cli.go:105) — dereferences `c.rootCommand`
|
||||
- `PrintHelp` (cli.go:118) — dereferences `c.rootCommand` and `c.bannerFunction`
|
||||
- All flag methods (`BoolFlag`, `StringFlag`, `IntFlag`) — dereference `c.rootCommand`
|
||||
|
||||
**Example crash path:**
|
||||
```go
|
||||
c, _ := core.New()
|
||||
c.Cli().AddCommand(...) // panic: nil pointer dereference (rootCommand)
|
||||
```
|
||||
|
||||
**Impact:** The CLI subsystem is **completely unusable** in any Core instance created via the standard `New()` constructor. Any consumer attempting to expose CLI commands via `c.Cli()` will crash immediately.
|
||||
|
||||
**Fix:** Either add a `WithCli(name, version, description)` option to initialise the CLI with a root command, or modify `New()` to create a default root command when `Cli` is instantiated. See also command.go nil check issue below.
|
||||
|
||||
---
|
||||
|
||||
### [CRITICAL] lock.go:12–16 — Package-Level Shared Lock Map Violates Core Isolation
|
||||
|
||||
**Severity:** Logical error (silent cross-instance interference)
|
||||
**File:** pkg/core/lock.go
|
||||
**Lines:** 12–16
|
||||
**Confidence:** 90%
|
||||
|
||||
The `lockMap` is a package-level global shared across all `Core` instances in the process:
|
||||
|
||||
```go
|
||||
var (
|
||||
lockMu sync.Mutex
|
||||
lockMap = make(map[string]*sync.RWMutex) // SHARED across all Core instances
|
||||
)
|
||||
```
|
||||
|
||||
Any two `Core` instances calling `c.Lock("srv")` will receive a pointer to the **same underlying `*sync.RWMutex`**. This violates the logical isolation of independent `Core` instances — service registration on one instance can block registration on another.
|
||||
|
||||
**Impact:** In a multi-Core application (realistic for test harnesses, plugin systems, or microservices deployed in the same process), acquiring a named lock on one Core blocks all other Cores sharing that name. This is a silent logical error that could manifest as mysterious hangs or deadlocks in production.
|
||||
|
||||
**Fix:** Move `lockMap` and `lockMu` into the `Core` struct (or wrap it in a per-Core map).
|
||||
|
||||
---
|
||||
|
||||
## High-Priority Findings
|
||||
|
||||
### [HIGH] service.go:46–66 — Lock-Order Violation (AB/BA Deadlock Risk)
|
||||
|
||||
**Severity:** Deadlock risk
|
||||
**File:** pkg/core/service.go
|
||||
**Lines:** 46–66
|
||||
**Confidence:** 85%
|
||||
|
||||
When registering a service with `c.Service(name, instance)`, the code acquires the `"srv"` write lock, then calls `c.i18n.AddLocales()` **while holding that lock** (line 66):
|
||||
|
||||
```go
|
||||
c.Lock("srv").Mu.Lock() // Acquire "srv" lock
|
||||
defer c.Lock("srv").Mu.Unlock()
|
||||
// ...
|
||||
c.i18n.AddLocales(lp.Locales()) // AddLocales acquires i18n.mu
|
||||
```
|
||||
|
||||
Any code path that acquires `i18n.mu` first and then tries to acquire the `"srv"` lock creates an AB/BA deadlock scenario.
|
||||
|
||||
**Impact:** Lock-order inversions are latent but often invisible until they manifest under concurrent load. This is a classic correctness issue in multi-threaded systems.
|
||||
|
||||
**Fix:** Move the `AddLocales` call outside the `"srv"` lock, or document and enforce a strict global lock ordering: always acquire `"srv"` before `i18n.mu` (or vice versa).
|
||||
|
||||
---
|
||||
|
||||
### [HIGH] log.go:308–309 — Data Race on `defaultLog` Pointer
|
||||
|
||||
**Severity:** Data race
|
||||
**File:** pkg/core/log.go
|
||||
**Lines:** 308–309
|
||||
**Confidence:** 88%
|
||||
|
||||
`SetDefault` performs an unsynchronised write to the package-level `defaultLog` variable. If any goroutine is concurrently reading `defaultLog` (e.g., via `Debug()`, `Info()`, `SetLevel()`, etc.), this is a concurrent read+write data race on the pointer itself.
|
||||
|
||||
```go
|
||||
func SetDefault(l *Log) {
|
||||
defaultLog = l // unsynchronised write to package-level pointer
|
||||
}
|
||||
```
|
||||
|
||||
**Impact:** Under high concurrency, a goroutine calling `Debug()` might read a partially-written pointer value, resulting in nil dereference, panic, or silent corruption.
|
||||
|
||||
**Fix:** Use `sync/atomic.Pointer[Log]` or protect reads and writes with a package-level RWMutex.
|
||||
|
||||
---
|
||||
|
||||
### [HIGH] error.go:390–391, 419–420 — File I/O Convention Violation & Directory Permissions
|
||||
|
||||
**Severity:** Convention violation + permission issue
|
||||
**File:** pkg/core/error.go
|
||||
**Lines:** 390–391, 406–422
|
||||
**Confidence:** 88%
|
||||
|
||||
The crash report file I/O uses `os.ReadFile` / `os.WriteFile` directly, bypassing the project convention that **all file I/O must go through `c.Fs()`** (sandboxed local filesystem). Additionally:
|
||||
|
||||
1. `os.MkdirAll(filepath.Dir(h.filePath), 0755)` at line 419 uses world-readable permissions (0755) for a directory containing crash reports. This allows other local users to enumerate and potentially read sensitive crash metadata.
|
||||
2. The file itself is written with 0600 (correct), but the directory is unprotected.
|
||||
|
||||
**Impact:** Convention violation couples the error subsystem to the OS file system, preventing testing/mocking. Security issue: crash reports (containing sensitive stack traces, system info) are readable by other local users.
|
||||
|
||||
**Fix:**
|
||||
- Replace `os.ReadFile/WriteFile` with `c.Fs().Read()/Write()`
|
||||
- Change `os.MkdirAll(..., 0755)` to `0700` or use `c.Fs().Mkdir()`
|
||||
- Consider wrapping crash reporting in the abstracted I/O layer
|
||||
|
||||
---
|
||||
|
||||
### [HIGH] contract.go:156–168 — Missing Nil Check in WithName
|
||||
|
||||
**Severity:** Silent error (panics later)
|
||||
**File:** pkg/core/contract.go
|
||||
**Lines:** 156–168
|
||||
**Confidence:** 78%
|
||||
|
||||
`WithName` does not validate that the `serviceInstance` returned by the factory is non-nil, unlike `WithService` (which guards this at lines 121–124). If the factory returns `(nil, nil)`, the nil is registered in the service map, and any type assertion or interface call on the retrieved service will panic.
|
||||
|
||||
```go
|
||||
svc, err := factory()
|
||||
if err != nil { return ... }
|
||||
// NO CHECK FOR nil svc
|
||||
result := c.Service(name, serviceInstance)
|
||||
```
|
||||
|
||||
**Fix:** Add the same nil check as `WithService`:
|
||||
```go
|
||||
if serviceInstance == nil {
|
||||
return E("core.WithName", fmt.Sprintf("service factory returned nil instance for %q", name), nil)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Medium-Priority Findings
|
||||
|
||||
### [MEDIUM] config.go:101–106 — Nil ConfigOpts Dereference
|
||||
|
||||
**Severity:** Panic (nil pointer dereference)
|
||||
**File:** pkg/core/config.go
|
||||
**Lines:** 101–106
|
||||
**Confidence:** 85%
|
||||
|
||||
`Config.Enabled` and `Config.EnabledFeatures` read `e.Features` without checking if `e.ConfigOpts == nil`. While `New()` initialises `ConfigOpts`, `Config` is an exported struct and a zero-value `Config{}` would have nil `ConfigOpts`, causing a panic.
|
||||
|
||||
The `Get` method at line 63 has the correct guard (`if e.ConfigOpts == nil`), but `Enabled` and `EnabledFeatures` do not. **Inconsistency increases the likelihood of misuse.**
|
||||
|
||||
**Fix:** Add nil guard to `Enabled` and `EnabledFeatures`:
|
||||
```go
|
||||
func (e *Config) Enabled(feature string) bool {
|
||||
e.mu.RLock()
|
||||
defer e.mu.RUnlock()
|
||||
if e.ConfigOpts == nil || e.Features == nil {
|
||||
return false
|
||||
}
|
||||
return e.Features[feature]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### [MEDIUM] fs.go:85–86 — Logging Convention Violation & Log Injection Risk
|
||||
|
||||
**Severity:** Convention violation + log injection risk
|
||||
**File:** pkg/core/fs.go
|
||||
**Lines:** 85–86
|
||||
**Confidence:** 75%
|
||||
|
||||
The sandbox-escape security event at `validatePath` logs directly to `os.Stderr` using `fmt.Fprintf`, bypassing the structured `Log` system. The log line includes user-controlled input (the attempted path) **without sanitisation**, creating a log injection vector.
|
||||
|
||||
```go
|
||||
_, _ = fmt.Fprintf(os.Stderr, "potential sandbox escape attempt: realpath=%s input=%s user=%s\n", realNext, p, Username())
|
||||
```
|
||||
|
||||
An attacker-controlled path like `"foo\nmalicious log line\n"` would corrupt the log output. Additionally, by-passing the `Log` system violates consistency and prevents testing/log capture.
|
||||
|
||||
**Fix:**
|
||||
- Use `c.Log().Security(...)` instead of `fmt.Fprintf`
|
||||
- Sanitise the path and username before including in the log message:
|
||||
```go
|
||||
c.Log().Security("sandbox escape attempt", "path", filepath.Clean(p), "user", Username())
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### [MEDIUM] command.go:132, 152, 176, 218 — Nil `app` Dereference
|
||||
|
||||
**Severity:** Panic (nil pointer dereference)
|
||||
**File:** pkg/core/command.go
|
||||
**Lines:** 132, 152, 176, 218
|
||||
**Confidence:** 80%
|
||||
|
||||
`Command` objects can be created with the public `NewCommand()` constructor, which leaves `c.app` nil. Later, when `run()`, `PrintHelp()`, or `isDefaultCommand()` is called, they dereference `c.app` without checking:
|
||||
|
||||
```go
|
||||
// Line 132: c.app.errorHandler(...)
|
||||
// Line 152: c.app.defaultCommand
|
||||
// Line 176: c.app.PrintBanner()
|
||||
// Line 218: c.app != nil check exists but only here
|
||||
```
|
||||
|
||||
**Impact:** Any consumer manually constructing a command tree using `NewCommand` and then calling methods on it will panic. The `setApp` method (which assigns `c.app`) is unexported, making it inaccessible to callers.
|
||||
|
||||
**Fix:** Either make `setApp` public and require callers to use it, or add a lazy initialisation guard:
|
||||
```go
|
||||
func (c *Command) ensureApp() error {
|
||||
if c.app == nil {
|
||||
return E("Command.run", "Command not properly initialised; missing setApp call", nil)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Low-Priority Findings
|
||||
|
||||
### [LOW] log.go:388–397 — Convention Violation in LogPan.Recover
|
||||
|
||||
**Severity:** Lost context
|
||||
**File:** pkg/core/log.go
|
||||
**Lines:** 388–397
|
||||
**Confidence:** 65%
|
||||
|
||||
`LogPan.Recover` creates an error with `fmt.Errorf("%v", r)` for non-error panics (line 390), violating the project convention that errors must use `core.E()`. The resulting error lacks `Op` and structured context:
|
||||
|
||||
```go
|
||||
err, ok := r.(error)
|
||||
if !ok {
|
||||
err = fmt.Errorf("%v", r) // CONVENTION VIOLATION: should use core.E()
|
||||
}
|
||||
// ...
|
||||
lp.log.Error("panic recovered", "op", Op(err), "stack", FormatStackTrace(err))
|
||||
```
|
||||
|
||||
Since `fmt.Errorf` returns a plain `error`, `Op(err)` and `FormatStackTrace(err)` will return empty strings. The log loses context.
|
||||
|
||||
**Fix:**
|
||||
```go
|
||||
if !ok {
|
||||
err = core.E("panic", fmt.Sprintf("%v", r), nil)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### [LOW] error.go:350–351 — Convention Violation in ErrPan.Recover
|
||||
|
||||
**Severity:** Lost context
|
||||
**File:** pkg/core/error.go
|
||||
**Lines:** 350–351
|
||||
**Confidence:** 60%
|
||||
|
||||
Same issue as above: `ErrPan.Recover` uses `fmt.Errorf` for non-error panics instead of `core.E()`. The crash report's `Error` field will be an unstructured string rather than a wrapped `core.Err`.
|
||||
|
||||
**Fix:**
|
||||
```go
|
||||
if !ok {
|
||||
err = core.E("panic", fmt.Sprintf("%v", r), nil)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Severity | Count | Files | Action Required |
|
||||
|----------|-------|-------|-----------------|
|
||||
| CRITICAL | 3 | cli.go, embed.go, lock.go | **MUST FIX** — system is unusable or crashes |
|
||||
| HIGH | 4 | service.go, log.go, error.go, contract.go | **MUST FIX** — logic errors, data races, deadlock risk |
|
||||
| MEDIUM | 3 | config.go, fs.go, command.go | **SHOULD FIX** — crashes or convention violations |
|
||||
| LOW | 2 | log.go, error.go | **CONSIDER** — lost context, non-critical |
|
||||
| **TOTAL** | **12** | | |
|
||||
|
||||
---
|
||||
|
||||
## Priority Implementation Order
|
||||
|
||||
### Phase 1: Stability (CRITICAL + HIGH)
|
||||
1. **cli.go** — Initialise `rootCommand` and `bannerFunction` in `New()` or add `WithCli` option
|
||||
2. **embed.go** — Hold `assetGroupsMu` lock through both reads in `GetAsset`
|
||||
3. **log.go** — Use `atomic.Pointer[Log]` for `defaultLog` or add RWMutex
|
||||
4. **lock.go** — Move `lockMap` into `Core` struct (per-instance isolation)
|
||||
5. **service.go** — Move `AddLocales` call outside the `"srv"` lock or document strict lock ordering
|
||||
|
||||
### Phase 2: Correctness (MEDIUM)
|
||||
6. **command.go** — Add nil check guard or make `setApp` public
|
||||
7. **config.go** — Add nil guard to `Enabled` and `EnabledFeatures`
|
||||
8. **fs.go** — Use `c.Log().Security()` and sanitise log input
|
||||
9. **contract.go** — Add nil check in `WithName` factory validation
|
||||
|
||||
### Phase 3: Polish (LOW)
|
||||
10. **error.go** — Use `core.E()` in both `ErrPan.Recover` and `LogPan.Recover`
|
||||
11. **error.go** — Use `c.Fs()` for crash report I/O; change directory mode to 0700
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy Post-Fix
|
||||
|
||||
- **Unit tests** for each module, with concurrent load testing for lock contention
|
||||
- **Integration tests** for multi-Core scenarios (lock.go fix)
|
||||
- **Race detector** (`go test -race ./...`) must pass
|
||||
- **Panic recovery tests** for nil-dereference scenarios (cli.go, command.go)
|
||||
- **Functional tests** for crash reporting (error.go)
|
||||
|
||||
---
|
||||
|
||||
## Files Affected (14 Consumers)
|
||||
|
||||
- agent, api, cli, core, config, docs, gui, ide, lint, mcp, php, ts, LEM, eaas
|
||||
|
||||
All of these should run tests and regressions after fixes are applied.
|
||||
Loading…
Add table
Reference in a new issue