326 lines
7.3 KiB
Markdown
326 lines
7.3 KiB
Markdown
# RFC-003: DataNode In-Memory Filesystem
|
|
|
|
**Status**: Draft
|
|
**Author**: [Snider](https://github.com/Snider/)
|
|
**Created**: 2026-01-13
|
|
**License**: EUPL-1.2
|
|
|
|
---
|
|
|
|
## Abstract
|
|
|
|
DataNode is an in-memory filesystem abstraction implementing Go's `fs.FS` interface. It provides the foundation for collecting, manipulating, and serializing file trees without touching disk.
|
|
|
|
## 1. Overview
|
|
|
|
DataNode serves as the core data structure for:
|
|
- Collecting files from various sources (GitHub, websites, PWAs)
|
|
- Building container filesystems (TIM rootfs)
|
|
- Serializing to/from tar archives
|
|
- Encrypting as TRIX format
|
|
|
|
## 2. Implementation
|
|
|
|
### 2.1 Core Type
|
|
|
|
```go
|
|
type DataNode struct {
|
|
files map[string]*dataFile
|
|
}
|
|
|
|
type dataFile struct {
|
|
name string
|
|
content []byte
|
|
modTime time.Time
|
|
}
|
|
```
|
|
|
|
**Key insight**: DataNode uses a **flat key-value map**, not a nested tree structure. Paths are stored as keys directly, and directories are implicit (derived from path prefixes).
|
|
|
|
### 2.2 fs.FS Implementation
|
|
|
|
DataNode implements these interfaces:
|
|
|
|
| Interface | Method | Description |
|
|
|-----------|--------|-------------|
|
|
| `fs.FS` | `Open(name string)` | Returns fs.File for path |
|
|
| `fs.StatFS` | `Stat(name string)` | Returns fs.FileInfo |
|
|
| `fs.ReadDirFS` | `ReadDir(name string)` | Lists directory contents |
|
|
|
|
### 2.3 Internal Helper Types
|
|
|
|
```go
|
|
// File metadata
|
|
type dataFileInfo struct {
|
|
name string
|
|
size int64
|
|
modTime time.Time
|
|
}
|
|
func (fi *dataFileInfo) Mode() fs.FileMode { return 0444 } // Read-only
|
|
|
|
// Directory metadata
|
|
type dirInfo struct {
|
|
name string
|
|
}
|
|
func (di *dirInfo) Mode() fs.FileMode { return fs.ModeDir | 0555 }
|
|
|
|
// File reader (implements fs.File)
|
|
type dataFileReader struct {
|
|
info *dataFileInfo
|
|
reader *bytes.Reader
|
|
}
|
|
|
|
// Directory reader (implements fs.File)
|
|
type dirFile struct {
|
|
info *dirInfo
|
|
entries []fs.DirEntry
|
|
offset int
|
|
}
|
|
```
|
|
|
|
## 3. Operations
|
|
|
|
### 3.1 Construction
|
|
|
|
```go
|
|
// Create empty DataNode
|
|
node := datanode.New()
|
|
|
|
// Returns: &DataNode{files: make(map[string]*dataFile)}
|
|
```
|
|
|
|
### 3.2 Adding Files
|
|
|
|
```go
|
|
// Add file with content
|
|
node.AddData("path/to/file.txt", []byte("content"))
|
|
|
|
// Trailing slashes are ignored (treated as directory indicator)
|
|
node.AddData("path/to/dir/", []byte("")) // Stored as "path/to/dir"
|
|
```
|
|
|
|
**Note**: Parent directories are NOT explicitly created. They are implicit based on path prefixes.
|
|
|
|
### 3.3 File Access
|
|
|
|
```go
|
|
// Open file (fs.FS interface)
|
|
f, err := node.Open("path/to/file.txt")
|
|
if err != nil {
|
|
// fs.ErrNotExist if not found
|
|
}
|
|
defer f.Close()
|
|
content, _ := io.ReadAll(f)
|
|
|
|
// Stat file
|
|
info, err := node.Stat("path/to/file.txt")
|
|
// info.Name(), info.Size(), info.ModTime(), info.Mode()
|
|
|
|
// Read directory
|
|
entries, err := node.ReadDir("path/to")
|
|
for _, entry := range entries {
|
|
// entry.Name(), entry.IsDir(), entry.Type()
|
|
}
|
|
```
|
|
|
|
### 3.4 Walking
|
|
|
|
```go
|
|
err := fs.WalkDir(node, ".", func(path string, d fs.DirEntry, err error) error {
|
|
if err != nil {
|
|
return err
|
|
}
|
|
if !d.IsDir() {
|
|
// Process file
|
|
}
|
|
return nil
|
|
})
|
|
```
|
|
|
|
## 4. Path Semantics
|
|
|
|
### 4.1 Path Handling
|
|
|
|
- **Leading slashes stripped**: `/path/file` → `path/file`
|
|
- **Trailing slashes ignored**: `path/dir/` → `path/dir`
|
|
- **Forward slashes only**: Uses `/` regardless of OS
|
|
- **Case-sensitive**: `File.txt` ≠ `file.txt`
|
|
- **Direct lookup**: Paths stored as flat keys
|
|
|
|
### 4.2 Valid Paths
|
|
|
|
```
|
|
file.txt → stored as "file.txt"
|
|
dir/file.txt → stored as "dir/file.txt"
|
|
/absolute/path → stored as "absolute/path" (leading / stripped)
|
|
path/to/dir/ → stored as "path/to/dir" (trailing / stripped)
|
|
```
|
|
|
|
### 4.3 Directory Detection
|
|
|
|
Directories are **implicit**. A directory exists if:
|
|
1. Any file path has it as a prefix
|
|
2. Example: Adding `a/b/c.txt` implicitly creates directories `a` and `a/b`
|
|
|
|
```go
|
|
// ReadDir finds directories by scanning all paths
|
|
func (dn *DataNode) ReadDir(name string) ([]fs.DirEntry, error) {
|
|
// Scans all keys for matching prefix
|
|
// Returns unique immediate children
|
|
}
|
|
```
|
|
|
|
## 5. Tar Serialization
|
|
|
|
### 5.1 ToTar
|
|
|
|
```go
|
|
tarBytes, err := node.ToTar()
|
|
```
|
|
|
|
**Format**:
|
|
- All files written as `tar.TypeReg` (regular files)
|
|
- Header Mode: **0600** (fixed, not original mode)
|
|
- No explicit directory entries
|
|
- ModTime preserved from dataFile
|
|
|
|
```go
|
|
// Serialization logic
|
|
for path, file := range dn.files {
|
|
header := &tar.Header{
|
|
Name: path,
|
|
Mode: 0600, // Fixed mode
|
|
Size: int64(len(file.content)),
|
|
ModTime: file.modTime,
|
|
Typeflag: tar.TypeReg,
|
|
}
|
|
tw.WriteHeader(header)
|
|
tw.Write(file.content)
|
|
}
|
|
```
|
|
|
|
### 5.2 FromTar
|
|
|
|
```go
|
|
node, err := datanode.FromTar(tarBytes)
|
|
```
|
|
|
|
**Parsing**:
|
|
- Only reads `tar.TypeReg` entries
|
|
- Ignores directory entries (`tar.TypeDir`)
|
|
- Stores path and content in flat map
|
|
|
|
```go
|
|
// Deserialization logic
|
|
for {
|
|
header, err := tr.Next()
|
|
if header.Typeflag == tar.TypeReg {
|
|
content, _ := io.ReadAll(tr)
|
|
dn.files[header.Name] = &dataFile{
|
|
name: filepath.Base(header.Name),
|
|
content: content,
|
|
modTime: header.ModTime,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5.3 Compressed Variants
|
|
|
|
```go
|
|
// gzip compressed
|
|
tarGz, err := node.ToTarGz()
|
|
node, err := datanode.FromTarGz(tarGzBytes)
|
|
|
|
// xz compressed
|
|
tarXz, err := node.ToTarXz()
|
|
node, err := datanode.FromTarXz(tarXzBytes)
|
|
```
|
|
|
|
## 6. File Modes
|
|
|
|
| Context | Mode | Notes |
|
|
|---------|------|-------|
|
|
| File read (fs.FS) | 0444 | Read-only for all |
|
|
| Directory (fs.FS) | 0555 | Read+execute for all |
|
|
| Tar export | 0600 | Owner read/write only |
|
|
|
|
**Note**: Original file modes are NOT preserved. All files get fixed modes.
|
|
|
|
## 7. Memory Model
|
|
|
|
- All content held in memory as `[]byte`
|
|
- No lazy loading
|
|
- No memory mapping
|
|
- Thread-safe for concurrent reads (map is not mutated after creation)
|
|
|
|
### 7.1 Size Calculation
|
|
|
|
```go
|
|
func (dn *DataNode) Size() int64 {
|
|
var total int64
|
|
for _, f := range dn.files {
|
|
total += int64(len(f.content))
|
|
}
|
|
return total
|
|
}
|
|
```
|
|
|
|
## 8. Integration Points
|
|
|
|
### 8.1 TIM RootFS
|
|
|
|
```go
|
|
tim := &tim.TIM{
|
|
Config: configJSON,
|
|
RootFS: datanode, // DataNode as container filesystem
|
|
}
|
|
```
|
|
|
|
### 8.2 TRIX Encryption
|
|
|
|
```go
|
|
// Encrypt DataNode to TRIX
|
|
encrypted, err := trix.Encrypt(datanode.ToTar(), password)
|
|
|
|
// Decrypt TRIX to DataNode
|
|
tarBytes, err := trix.Decrypt(encrypted, password)
|
|
node, err := datanode.FromTar(tarBytes)
|
|
```
|
|
|
|
### 8.3 Collectors
|
|
|
|
```go
|
|
// GitHub collector returns DataNode
|
|
node, err := github.CollectRepo(url)
|
|
|
|
// Website collector returns DataNode
|
|
node, err := website.Collect(url, depth)
|
|
```
|
|
|
|
## 9. Implementation Reference
|
|
|
|
- Source: `pkg/datanode/datanode.go`
|
|
- Tests: `pkg/datanode/datanode_test.go`
|
|
|
|
## 10. Security Considerations
|
|
|
|
1. **Path traversal**: Leading slashes stripped; no `..` handling needed (flat map)
|
|
2. **Memory exhaustion**: No built-in limits; caller must validate input size
|
|
3. **Tar bombs**: FromTar reads all entries into memory
|
|
4. **Symlinks**: Not supported (intentional - tar.TypeReg only)
|
|
|
|
## 11. Limitations
|
|
|
|
- No symlink support
|
|
- No extended attributes
|
|
- No sparse files
|
|
- Fixed file modes (0600 on export)
|
|
- No streaming (full content in memory)
|
|
|
|
## 12. Future Work
|
|
|
|
- [ ] Streaming tar generation for large files
|
|
- [ ] Optional mode preservation
|
|
- [ ] Size limits for untrusted input
|
|
- [ ] Lazy loading for large datasets
|