Borg/rfc/RFC-003-DATANODE.md

7.3 KiB

RFC-003: DataNode In-Memory Filesystem

Status: Draft Author: Snider Created: 2026-01-13 License: EUPL-1.2


Abstract

DataNode is an in-memory filesystem abstraction implementing Go's fs.FS interface. It provides the foundation for collecting, manipulating, and serializing file trees without touching disk.

1. Overview

DataNode serves as the core data structure for:

  • Collecting files from various sources (GitHub, websites, PWAs)
  • Building container filesystems (TIM rootfs)
  • Serializing to/from tar archives
  • Encrypting as TRIX format

2. Implementation

2.1 Core Type

type DataNode struct {
    files map[string]*dataFile
}

type dataFile struct {
    name    string
    content []byte
    modTime time.Time
}

Key insight: DataNode uses a flat key-value map, not a nested tree structure. Paths are stored as keys directly, and directories are implicit (derived from path prefixes).

2.2 fs.FS Implementation

DataNode implements these interfaces:

Interface Method Description
fs.FS Open(name string) Returns fs.File for path
fs.StatFS Stat(name string) Returns fs.FileInfo
fs.ReadDirFS ReadDir(name string) Lists directory contents

2.3 Internal Helper Types

// File metadata
type dataFileInfo struct {
    name    string
    size    int64
    modTime time.Time
}
func (fi *dataFileInfo) Mode() fs.FileMode { return 0444 }  // Read-only

// Directory metadata
type dirInfo struct {
    name string
}
func (di *dirInfo) Mode() fs.FileMode { return fs.ModeDir | 0555 }

// File reader (implements fs.File)
type dataFileReader struct {
    info   *dataFileInfo
    reader *bytes.Reader
}

// Directory reader (implements fs.File)
type dirFile struct {
    info    *dirInfo
    entries []fs.DirEntry
    offset  int
}

3. Operations

3.1 Construction

// Create empty DataNode
node := datanode.New()

// Returns: &DataNode{files: make(map[string]*dataFile)}

3.2 Adding Files

// Add file with content
node.AddData("path/to/file.txt", []byte("content"))

// Trailing slashes are ignored (treated as directory indicator)
node.AddData("path/to/dir/", []byte(""))  // Stored as "path/to/dir"

Note: Parent directories are NOT explicitly created. They are implicit based on path prefixes.

3.3 File Access

// Open file (fs.FS interface)
f, err := node.Open("path/to/file.txt")
if err != nil {
    // fs.ErrNotExist if not found
}
defer f.Close()
content, _ := io.ReadAll(f)

// Stat file
info, err := node.Stat("path/to/file.txt")
// info.Name(), info.Size(), info.ModTime(), info.Mode()

// Read directory
entries, err := node.ReadDir("path/to")
for _, entry := range entries {
    // entry.Name(), entry.IsDir(), entry.Type()
}

3.4 Walking

err := fs.WalkDir(node, ".", func(path string, d fs.DirEntry, err error) error {
    if err != nil {
        return err
    }
    if !d.IsDir() {
        // Process file
    }
    return nil
})

4. Path Semantics

4.1 Path Handling

  • Leading slashes stripped: /path/filepath/file
  • Trailing slashes ignored: path/dir/path/dir
  • Forward slashes only: Uses / regardless of OS
  • Case-sensitive: File.txtfile.txt
  • Direct lookup: Paths stored as flat keys

4.2 Valid Paths

file.txt              → stored as "file.txt"
dir/file.txt          → stored as "dir/file.txt"
/absolute/path        → stored as "absolute/path" (leading / stripped)
path/to/dir/          → stored as "path/to/dir" (trailing / stripped)

4.3 Directory Detection

Directories are implicit. A directory exists if:

  1. Any file path has it as a prefix
  2. Example: Adding a/b/c.txt implicitly creates directories a and a/b
// ReadDir finds directories by scanning all paths
func (dn *DataNode) ReadDir(name string) ([]fs.DirEntry, error) {
    // Scans all keys for matching prefix
    // Returns unique immediate children
}

5. Tar Serialization

5.1 ToTar

tarBytes, err := node.ToTar()

Format:

  • All files written as tar.TypeReg (regular files)
  • Header Mode: 0600 (fixed, not original mode)
  • No explicit directory entries
  • ModTime preserved from dataFile
// Serialization logic
for path, file := range dn.files {
    header := &tar.Header{
        Name:    path,
        Mode:    0600,          // Fixed mode
        Size:    int64(len(file.content)),
        ModTime: file.modTime,
        Typeflag: tar.TypeReg,
    }
    tw.WriteHeader(header)
    tw.Write(file.content)
}

5.2 FromTar

node, err := datanode.FromTar(tarBytes)

Parsing:

  • Only reads tar.TypeReg entries
  • Ignores directory entries (tar.TypeDir)
  • Stores path and content in flat map
// Deserialization logic
for {
    header, err := tr.Next()
    if header.Typeflag == tar.TypeReg {
        content, _ := io.ReadAll(tr)
        dn.files[header.Name] = &dataFile{
            name:    filepath.Base(header.Name),
            content: content,
            modTime: header.ModTime,
        }
    }
}

5.3 Compressed Variants

// gzip compressed
tarGz, err := node.ToTarGz()
node, err := datanode.FromTarGz(tarGzBytes)

// xz compressed
tarXz, err := node.ToTarXz()
node, err := datanode.FromTarXz(tarXzBytes)

6. File Modes

Context Mode Notes
File read (fs.FS) 0444 Read-only for all
Directory (fs.FS) 0555 Read+execute for all
Tar export 0600 Owner read/write only

Note: Original file modes are NOT preserved. All files get fixed modes.

7. Memory Model

  • All content held in memory as []byte
  • No lazy loading
  • No memory mapping
  • Thread-safe for concurrent reads (map is not mutated after creation)

7.1 Size Calculation

func (dn *DataNode) Size() int64 {
    var total int64
    for _, f := range dn.files {
        total += int64(len(f.content))
    }
    return total
}

8. Integration Points

8.1 TIM RootFS

tim := &tim.TIM{
    Config: configJSON,
    RootFS: datanode,  // DataNode as container filesystem
}

8.2 TRIX Encryption

// Encrypt DataNode to TRIX
encrypted, err := trix.Encrypt(datanode.ToTar(), password)

// Decrypt TRIX to DataNode
tarBytes, err := trix.Decrypt(encrypted, password)
node, err := datanode.FromTar(tarBytes)

8.3 Collectors

// GitHub collector returns DataNode
node, err := github.CollectRepo(url)

// Website collector returns DataNode
node, err := website.Collect(url, depth)

9. Implementation Reference

  • Source: pkg/datanode/datanode.go
  • Tests: pkg/datanode/datanode_test.go

10. Security Considerations

  1. Path traversal: Leading slashes stripped; no .. handling needed (flat map)
  2. Memory exhaustion: No built-in limits; caller must validate input size
  3. Tar bombs: FromTar reads all entries into memory
  4. Symlinks: Not supported (intentional - tar.TypeReg only)

11. Limitations

  • No symlink support
  • No extended attributes
  • No sparse files
  • Fixed file modes (0600 on export)
  • No streaming (full content in memory)

12. Future Work

  • Streaming tar generation for large files
  • Optional mode preservation
  • Size limits for untrusted input
  • Lazy loading for large datasets