go-cache/docs/architecture.md
Snider 0186ddc119 fix(docs,tests): fix stale coreerr example and add missing test coverage
- docs/architecture.md: replace fmt.Errorf with coreerr.E() to match actual implementation
- cache_test.go: add TestGitHubKeys for GitHubReposKey/GitHubRepoKey (was 0% coverage)
- cache_test.go: add TestPathTraversalRejected for Path() traversal guard

Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-17 07:11:04 +00:00

6 KiB

title description
Architecture Internals of go-cache -- types, data flow, storage format, and security model.

Architecture

This document explains how go-cache works internally, covering its type system, on-disc format, data flow, and security considerations.

Core Types

Cache

type Cache struct {
    medium  io.Medium
    baseDir string
    ttl     time.Duration
}

Cache is the primary handle. It holds:

  • medium -- the storage backend (any io.Medium implementation).
  • baseDir -- the root directory under which all cache files live.
  • ttl -- how long entries remain valid after being written.

All three fields are set once during construction via cache.New() and are immutable for the lifetime of the instance.

Entry

type Entry struct {
    Data      json.RawMessage `json:"data"`
    CachedAt  time.Time       `json:"cached_at"`
    ExpiresAt time.Time       `json:"expires_at"`
}

Entry is the envelope written to storage. It wraps the caller's data as raw JSON and adds two timestamps for expiry tracking. Using json.RawMessage means the data payload is stored verbatim -- no intermediate deserialisation happens during writes.

Constructor Defaults

cache.New(medium, baseDir, ttl) applies sensible defaults when arguments are zero-valued:

Parameter Zero value Default applied
medium nil io.Local (unsandboxed local filesystem)
baseDir "" .core/cache/ relative to the working dir
ttl 0 cache.DefaultTTL (1 hour)

The constructor also calls medium.EnsureDir(baseDir) to guarantee the cache directory exists before any reads or writes.

Data Flow

Writing (Set)

caller data
    |
    v
json.Marshal(data)           -- serialise caller's value
    |
    v
wrap in Entry{               -- add timestamps
    Data:      <marshalled>,
    CachedAt:  time.Now(),
    ExpiresAt: time.Now().Add(ttl),
}
    |
    v
json.MarshalIndent(entry)    -- human-readable JSON
    |
    v
medium.Write(path, string)   -- persist via the storage backend

The resulting file on disc (or equivalent record in another medium) looks like:

{
  "data": { "foo": "bar" },
  "cached_at": "2026-03-10T14:30:00Z",
  "expires_at": "2026-03-10T15:30:00Z"
}

Parent directories for nested keys (e.g. github/host-uk/repos) are created automatically via medium.EnsureDir().

Reading (Get)

medium.Read(path)
    |
    v
json.Unmarshal -> Entry       -- parse the envelope
    |
    v
time.Now().After(ExpiresAt)?  -- check TTL
    |                |
   yes              no
    |                |
    v                v
return false    json.Unmarshal(entry.Data, dest)
(cache miss)         |
                     v
                return true
                (cache hit)

Key behaviours:

  • If the file does not exist (os.ErrNotExist), Get returns (false, nil) -- a miss, not an error.
  • If the file contains invalid JSON, it is treated as a miss (not an error). This prevents corrupted files from blocking the caller.
  • If the entry exists but has expired, it is treated as a miss. The stale file is not deleted eagerly -- it remains on disc until explicitly removed or overwritten.

Deletion

  • Delete(key) removes a single entry. If the file does not exist, the operation succeeds silently.
  • Clear() calls medium.DeleteAll(baseDir), removing the entire cache directory and all its contents.

Age Inspection

Age(key) returns the time.Duration since the entry was written (CachedAt). If the entry does not exist or cannot be parsed, it returns -1. This is useful for diagnostics without triggering the expiry check that Get performs.

Key-to-Path Mapping

Cache keys are mapped to file paths by appending .json and joining with the base directory:

key:  "github/host-uk/repos"
path: <baseDir>/github/host-uk/repos.json

Keys may contain forward slashes to create a directory hierarchy. This is how the GitHub key helpers work:

func GitHubReposKey(org string) string {
    return filepath.Join("github", org, "repos")
}

func GitHubRepoKey(org, repo string) string {
    return filepath.Join("github", org, repo, "meta")
}

Security: Path Traversal Prevention

The Path() method guards against directory traversal attacks. After computing the full path, it resolves both the base directory and the result to absolute paths, then checks that the result is still a prefix of the base:

if !strings.HasPrefix(absPath, absBase) {
    return "", coreerr.E("cache.Path", "invalid cache key: path traversal attempt", nil)
}

This means a key like ../../etc/passwd will be rejected before any I/O occurs. Every public method (Get, Set, Delete, Age) calls Path() internally, so traversal protection is always active.

Concurrency

The Cache struct does not include a mutex. Concurrent reads are safe (each call does independent file I/O), but concurrent writes to the same key may produce a race at the filesystem level. If your application writes to the same key from multiple goroutines, protect the call site with your own synchronisation.

In practice, caches in this ecosystem are typically written by a single goroutine (e.g. a CLI command fetching GitHub data) and read by others, which avoids contention.

Relationship to go-io

go-cache delegates all storage operations to the io.Medium interface from go-io. It uses only five methods:

Method Used by
EnsureDir New, Set
Read Get, Age
Write Set
Delete Delete
DeleteAll Clear

This minimal surface makes it straightforward to swap storage backends. For tests, io.NewMockMedium() provides a fully in-memory implementation with no disc access.