7.3 KiB
7.3 KiB
RFC-003: DataNode In-Memory Filesystem
Status: Draft Author: Snider Created: 2026-01-13 License: EUPL-1.2
Abstract
DataNode is an in-memory filesystem abstraction implementing Go's fs.FS interface. It provides the foundation for collecting, manipulating, and serializing file trees without touching disk.
1. Overview
DataNode serves as the core data structure for:
- Collecting files from various sources (GitHub, websites, PWAs)
- Building container filesystems (TIM rootfs)
- Serializing to/from tar archives
- Encrypting as TRIX format
2. Implementation
2.1 Core Type
type DataNode struct {
files map[string]*dataFile
}
type dataFile struct {
name string
content []byte
modTime time.Time
}
Key insight: DataNode uses a flat key-value map, not a nested tree structure. Paths are stored as keys directly, and directories are implicit (derived from path prefixes).
2.2 fs.FS Implementation
DataNode implements these interfaces:
| Interface | Method | Description |
|---|---|---|
fs.FS |
Open(name string) |
Returns fs.File for path |
fs.StatFS |
Stat(name string) |
Returns fs.FileInfo |
fs.ReadDirFS |
ReadDir(name string) |
Lists directory contents |
2.3 Internal Helper Types
// File metadata
type dataFileInfo struct {
name string
size int64
modTime time.Time
}
func (fi *dataFileInfo) Mode() fs.FileMode { return 0444 } // Read-only
// Directory metadata
type dirInfo struct {
name string
}
func (di *dirInfo) Mode() fs.FileMode { return fs.ModeDir | 0555 }
// File reader (implements fs.File)
type dataFileReader struct {
info *dataFileInfo
reader *bytes.Reader
}
// Directory reader (implements fs.File)
type dirFile struct {
info *dirInfo
entries []fs.DirEntry
offset int
}
3. Operations
3.1 Construction
// Create empty DataNode
node := datanode.New()
// Returns: &DataNode{files: make(map[string]*dataFile)}
3.2 Adding Files
// Add file with content
node.AddData("path/to/file.txt", []byte("content"))
// Trailing slashes are ignored (treated as directory indicator)
node.AddData("path/to/dir/", []byte("")) // Stored as "path/to/dir"
Note: Parent directories are NOT explicitly created. They are implicit based on path prefixes.
3.3 File Access
// Open file (fs.FS interface)
f, err := node.Open("path/to/file.txt")
if err != nil {
// fs.ErrNotExist if not found
}
defer f.Close()
content, _ := io.ReadAll(f)
// Stat file
info, err := node.Stat("path/to/file.txt")
// info.Name(), info.Size(), info.ModTime(), info.Mode()
// Read directory
entries, err := node.ReadDir("path/to")
for _, entry := range entries {
// entry.Name(), entry.IsDir(), entry.Type()
}
3.4 Walking
err := fs.WalkDir(node, ".", func(path string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
if !d.IsDir() {
// Process file
}
return nil
})
4. Path Semantics
4.1 Path Handling
- Leading slashes stripped:
/path/file→path/file - Trailing slashes ignored:
path/dir/→path/dir - Forward slashes only: Uses
/regardless of OS - Case-sensitive:
File.txt≠file.txt - Direct lookup: Paths stored as flat keys
4.2 Valid Paths
file.txt → stored as "file.txt"
dir/file.txt → stored as "dir/file.txt"
/absolute/path → stored as "absolute/path" (leading / stripped)
path/to/dir/ → stored as "path/to/dir" (trailing / stripped)
4.3 Directory Detection
Directories are implicit. A directory exists if:
- Any file path has it as a prefix
- Example: Adding
a/b/c.txtimplicitly creates directoriesaanda/b
// ReadDir finds directories by scanning all paths
func (dn *DataNode) ReadDir(name string) ([]fs.DirEntry, error) {
// Scans all keys for matching prefix
// Returns unique immediate children
}
5. Tar Serialization
5.1 ToTar
tarBytes, err := node.ToTar()
Format:
- All files written as
tar.TypeReg(regular files) - Header Mode: 0600 (fixed, not original mode)
- No explicit directory entries
- ModTime preserved from dataFile
// Serialization logic
for path, file := range dn.files {
header := &tar.Header{
Name: path,
Mode: 0600, // Fixed mode
Size: int64(len(file.content)),
ModTime: file.modTime,
Typeflag: tar.TypeReg,
}
tw.WriteHeader(header)
tw.Write(file.content)
}
5.2 FromTar
node, err := datanode.FromTar(tarBytes)
Parsing:
- Only reads
tar.TypeRegentries - Ignores directory entries (
tar.TypeDir) - Stores path and content in flat map
// Deserialization logic
for {
header, err := tr.Next()
if header.Typeflag == tar.TypeReg {
content, _ := io.ReadAll(tr)
dn.files[header.Name] = &dataFile{
name: filepath.Base(header.Name),
content: content,
modTime: header.ModTime,
}
}
}
5.3 Compressed Variants
// gzip compressed
tarGz, err := node.ToTarGz()
node, err := datanode.FromTarGz(tarGzBytes)
// xz compressed
tarXz, err := node.ToTarXz()
node, err := datanode.FromTarXz(tarXzBytes)
6. File Modes
| Context | Mode | Notes |
|---|---|---|
| File read (fs.FS) | 0444 | Read-only for all |
| Directory (fs.FS) | 0555 | Read+execute for all |
| Tar export | 0600 | Owner read/write only |
Note: Original file modes are NOT preserved. All files get fixed modes.
7. Memory Model
- All content held in memory as
[]byte - No lazy loading
- No memory mapping
- Thread-safe for concurrent reads (map is not mutated after creation)
7.1 Size Calculation
func (dn *DataNode) Size() int64 {
var total int64
for _, f := range dn.files {
total += int64(len(f.content))
}
return total
}
8. Integration Points
8.1 TIM RootFS
tim := &tim.TIM{
Config: configJSON,
RootFS: datanode, // DataNode as container filesystem
}
8.2 TRIX Encryption
// Encrypt DataNode to TRIX
encrypted, err := trix.Encrypt(datanode.ToTar(), password)
// Decrypt TRIX to DataNode
tarBytes, err := trix.Decrypt(encrypted, password)
node, err := datanode.FromTar(tarBytes)
8.3 Collectors
// GitHub collector returns DataNode
node, err := github.CollectRepo(url)
// Website collector returns DataNode
node, err := website.Collect(url, depth)
9. Implementation Reference
- Source:
pkg/datanode/datanode.go - Tests:
pkg/datanode/datanode_test.go
10. Security Considerations
- Path traversal: Leading slashes stripped; no
..handling needed (flat map) - Memory exhaustion: No built-in limits; caller must validate input size
- Tar bombs: FromTar reads all entries into memory
- Symlinks: Not supported (intentional - tar.TypeReg only)
11. Limitations
- No symlink support
- No extended attributes
- No sparse files
- Fixed file modes (0600 on export)
- No streaming (full content in memory)
12. Future Work
- Streaming tar generation for large files
- Optional mode preservation
- Size limits for untrusted input
- Lazy loading for large datasets