414 lines
12 KiB
Markdown
414 lines
12 KiB
Markdown
# RFC-0002: TRIX Binary Container Format
|
|
|
|
**Status:** Standards Track
|
|
**Version:** 2.0
|
|
**Created:** 2025-01-13
|
|
**Author:** Snider
|
|
|
|
## Abstract
|
|
|
|
This document specifies the TRIX binary container format, a generic and extensible file format designed to store arbitrary binary payloads alongside structured JSON metadata. The format is protocol-agnostic, supporting any encryption scheme, compression algorithm, or data transformation while providing a consistent structure for metadata discovery and payload extraction.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Introduction](#1-introduction)
|
|
2. [Terminology](#2-terminology)
|
|
3. [Format Specification](#3-format-specification)
|
|
4. [Header Specification](#4-header-specification)
|
|
5. [Encoding Process](#5-encoding-process)
|
|
6. [Decoding Process](#6-decoding-process)
|
|
7. [Checksum Verification](#7-checksum-verification)
|
|
8. [Magic Number Registry](#8-magic-number-registry)
|
|
9. [Security Considerations](#9-security-considerations)
|
|
10. [IANA Considerations](#10-iana-considerations)
|
|
11. [References](#11-references)
|
|
|
|
## 1. Introduction
|
|
|
|
The TRIX format addresses the need for a simple, self-describing binary container that can wrap any payload type with extensible metadata. Unlike format-specific containers (such as encrypted archive formats), TRIX separates the concerns of:
|
|
|
|
- **Container structure**: How data is organized on disk/wire
|
|
- **Payload semantics**: What the payload contains and how to process it
|
|
- **Metadata extensibility**: Application-specific attributes
|
|
|
|
### 1.1 Design Goals
|
|
|
|
- **Simplicity**: Minimal overhead, easy to implement
|
|
- **Extensibility**: JSON header allows arbitrary metadata
|
|
- **Protocol-agnostic**: No assumptions about payload encryption or encoding
|
|
- **Streaming-friendly**: Header length prefix enables streaming reads
|
|
- **Magic-number customizable**: Applications can define their own identifiers
|
|
|
|
### 1.2 Use Cases
|
|
|
|
- Encrypted data interchange
|
|
- Signed document containers
|
|
- Configuration file packaging
|
|
- Backup archive format
|
|
- Inter-service message envelopes
|
|
|
|
## 2. Terminology
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
|
|
|
|
**Container**: A complete TRIX-formatted byte sequence
|
|
**Magic Number**: A 4-byte identifier at the start of the container
|
|
**Header**: A JSON object containing metadata about the payload
|
|
**Payload**: The arbitrary binary data stored in the container
|
|
**Checksum**: An optional integrity verification value
|
|
|
|
## 3. Format Specification
|
|
|
|
### 3.1 Overview
|
|
|
|
A TRIX container consists of five sequential fields:
|
|
|
|
```
|
|
+----------------+---------+---------------+----------------+-----------+
|
|
| Magic Number | Version | Header Length | JSON Header | Payload |
|
|
+----------------+---------+---------------+----------------+-----------+
|
|
| 4 bytes | 1 byte | 4 bytes | Variable | Variable |
|
|
```
|
|
|
|
Total minimum size: 9 bytes (empty header, empty payload)
|
|
|
|
### 3.2 Field Definitions
|
|
|
|
#### 3.2.1 Magic Number (4 bytes)
|
|
|
|
A 4-byte ASCII string identifying the file type. This field:
|
|
|
|
- MUST be exactly 4 bytes
|
|
- SHOULD contain printable ASCII characters
|
|
- Is application-defined (not mandated by this specification)
|
|
|
|
Common conventions:
|
|
- `TRIX` - Generic TRIX container
|
|
- First character uppercase, application-specific identifier
|
|
|
|
#### 3.2.2 Version (1 byte)
|
|
|
|
An unsigned 8-bit integer indicating the format version.
|
|
|
|
| Value | Description |
|
|
|-------|-------------|
|
|
| 0x00 | Reserved |
|
|
| 0x01 | Version 1.0 (deprecated) |
|
|
| 0x02 | Version 2.0 (current) |
|
|
| 0x03-0xFF | Reserved for future versions |
|
|
|
|
Implementations MUST reject containers with unrecognized versions.
|
|
|
|
#### 3.2.3 Header Length (4 bytes)
|
|
|
|
A 32-bit unsigned integer in big-endian byte order specifying the length of the JSON Header in bytes.
|
|
|
|
- Minimum value: 0 (empty header represented as `{}` is 2 bytes, but 0 is valid)
|
|
- Maximum value: 16,777,215 (16 MB - 1 byte)
|
|
|
|
Implementations MUST reject headers exceeding 16 MB to prevent denial-of-service attacks.
|
|
|
|
```
|
|
Header Length = BigEndian32(length_of_json_header_bytes)
|
|
```
|
|
|
|
#### 3.2.4 JSON Header (Variable)
|
|
|
|
A UTF-8 encoded JSON object containing metadata. The header:
|
|
|
|
- MUST be valid JSON (RFC 8259)
|
|
- MUST be a JSON object (not array, string, or primitive)
|
|
- SHOULD use UTF-8 encoding without BOM
|
|
- MAY be empty (`{}`)
|
|
|
|
#### 3.2.5 Payload (Variable)
|
|
|
|
The arbitrary binary payload. The payload:
|
|
|
|
- MAY be empty (zero bytes)
|
|
- MAY contain any binary data
|
|
- Length is implicitly determined by: `container_length - 9 - header_length`
|
|
|
|
## 4. Header Specification
|
|
|
|
### 4.1 Reserved Header Fields
|
|
|
|
The following header fields have defined semantics:
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `content_type` | string | MIME type of the payload (before any transformations) |
|
|
| `checksum` | string | Hex-encoded checksum of the payload |
|
|
| `checksum_algo` | string | Algorithm used for checksum (e.g., "sha256") |
|
|
| `created_at` | string | ISO 8601 timestamp of creation |
|
|
| `encryption_algorithm` | string | Encryption algorithm identifier |
|
|
| `compression` | string | Compression algorithm identifier |
|
|
| `sigils` | array | Ordered list of transformation sigil names |
|
|
|
|
### 4.2 Extension Fields
|
|
|
|
Applications MAY include additional fields. To avoid conflicts:
|
|
|
|
- Custom fields SHOULD use a namespace prefix (e.g., `x-myapp-field`)
|
|
- Standard field names are lowercase with underscores
|
|
|
|
### 4.3 Example Headers
|
|
|
|
#### Encrypted payload:
|
|
```json
|
|
{
|
|
"content_type": "application/octet-stream",
|
|
"encryption_algorithm": "xchacha20poly1305",
|
|
"created_at": "2025-01-13T12:00:00Z"
|
|
}
|
|
```
|
|
|
|
#### Compressed and encoded payload:
|
|
```json
|
|
{
|
|
"content_type": "text/plain",
|
|
"compression": "gzip",
|
|
"sigils": ["gzip", "base64"],
|
|
"checksum": "a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e",
|
|
"checksum_algo": "sha256"
|
|
}
|
|
```
|
|
|
|
#### Minimal header:
|
|
```json
|
|
{}
|
|
```
|
|
|
|
## 5. Encoding Process
|
|
|
|
### 5.1 Algorithm
|
|
|
|
```
|
|
function Encode(payload: bytes, header: object, magic: string) -> bytes:
|
|
// Validate magic number
|
|
if length(magic) != 4:
|
|
return error("magic number must be 4 bytes")
|
|
|
|
// Serialize header to JSON
|
|
header_bytes = JSON.serialize(header)
|
|
header_length = length(header_bytes)
|
|
|
|
// Validate header size
|
|
if header_length > 16777215:
|
|
return error("header exceeds maximum size")
|
|
|
|
// Build container
|
|
container = empty byte buffer
|
|
|
|
// Write magic number (4 bytes)
|
|
container.write(magic)
|
|
|
|
// Write version (1 byte)
|
|
container.write(0x02)
|
|
|
|
// Write header length (4 bytes, big-endian)
|
|
container.write(BigEndian32(header_length))
|
|
|
|
// Write JSON header
|
|
container.write(header_bytes)
|
|
|
|
// Write payload
|
|
container.write(payload)
|
|
|
|
return container.bytes()
|
|
```
|
|
|
|
### 5.2 Checksum Integration
|
|
|
|
If integrity verification is required:
|
|
|
|
```
|
|
function EncodeWithChecksum(payload: bytes, header: object, magic: string, algo: string) -> bytes:
|
|
checksum = Hash(algo, payload)
|
|
header["checksum"] = HexEncode(checksum)
|
|
header["checksum_algo"] = algo
|
|
return Encode(payload, header, magic)
|
|
```
|
|
|
|
## 6. Decoding Process
|
|
|
|
### 6.1 Algorithm
|
|
|
|
```
|
|
function Decode(container: bytes, expected_magic: string) -> (header: object, payload: bytes):
|
|
// Validate minimum size
|
|
if length(container) < 9:
|
|
return error("container too small")
|
|
|
|
// Read and verify magic number
|
|
magic = container[0:4]
|
|
if magic != expected_magic:
|
|
return error("invalid magic number")
|
|
|
|
// Read and verify version
|
|
version = container[4]
|
|
if version != 0x02:
|
|
return error("unsupported version")
|
|
|
|
// Read header length
|
|
header_length = BigEndian32(container[5:9])
|
|
|
|
// Validate header length
|
|
if header_length > 16777215:
|
|
return error("header length exceeds maximum")
|
|
|
|
if length(container) < 9 + header_length:
|
|
return error("container truncated")
|
|
|
|
// Read and parse header
|
|
header_bytes = container[9:9+header_length]
|
|
header = JSON.parse(header_bytes)
|
|
|
|
// Read payload
|
|
payload = container[9+header_length:]
|
|
|
|
return (header, payload)
|
|
```
|
|
|
|
### 6.2 Streaming Decode
|
|
|
|
For large files, streaming decode is RECOMMENDED:
|
|
|
|
```
|
|
function StreamDecode(reader: Reader, expected_magic: string) -> (header: object, payload_reader: Reader):
|
|
// Read fixed-size prefix
|
|
prefix = reader.read(9)
|
|
|
|
// Validate magic and version
|
|
magic = prefix[0:4]
|
|
version = prefix[4]
|
|
header_length = BigEndian32(prefix[5:9])
|
|
|
|
// Read header
|
|
header_bytes = reader.read(header_length)
|
|
header = JSON.parse(header_bytes)
|
|
|
|
// Return remaining reader for payload streaming
|
|
return (header, reader)
|
|
```
|
|
|
|
## 7. Checksum Verification
|
|
|
|
### 7.1 Supported Algorithms
|
|
|
|
| Algorithm ID | Output Size | Notes |
|
|
|--------------|-------------|-------|
|
|
| `md5` | 16 bytes | NOT RECOMMENDED for security |
|
|
| `sha1` | 20 bytes | NOT RECOMMENDED for security |
|
|
| `sha256` | 32 bytes | RECOMMENDED |
|
|
| `sha384` | 48 bytes | |
|
|
| `sha512` | 64 bytes | |
|
|
| `blake2b-256` | 32 bytes | |
|
|
| `blake2b-512` | 64 bytes | |
|
|
|
|
### 7.2 Verification Process
|
|
|
|
```
|
|
function VerifyChecksum(header: object, payload: bytes) -> bool:
|
|
if "checksum" not in header:
|
|
return true // No checksum to verify
|
|
|
|
algo = header["checksum_algo"]
|
|
expected = HexDecode(header["checksum"])
|
|
actual = Hash(algo, payload)
|
|
|
|
return constant_time_compare(expected, actual)
|
|
```
|
|
|
|
## 8. Magic Number Registry
|
|
|
|
This section defines conventions for magic number allocation:
|
|
|
|
### 8.1 Reserved Magic Numbers
|
|
|
|
| Magic | Reserved For |
|
|
|-------|--------------|
|
|
| `TRIX` | Generic TRIX containers |
|
|
| `\x00\x00\x00\x00` | Reserved (null) |
|
|
| `\xFF\xFF\xFF\xFF` | Reserved (test/invalid) |
|
|
|
|
### 8.2 Allocation Guidelines
|
|
|
|
Applications SHOULD:
|
|
|
|
1. Use 4 printable ASCII characters
|
|
2. Start with an uppercase letter
|
|
3. Avoid common file format magic numbers (e.g., `%PDF`, `PK\x03\x04`)
|
|
4. Register custom magic numbers in their documentation
|
|
|
|
## 9. Security Considerations
|
|
|
|
### 9.1 Header Injection
|
|
|
|
The JSON header is parsed before processing. Implementations MUST:
|
|
|
|
- Validate JSON syntax strictly
|
|
- Reject headers with duplicate keys
|
|
- Not execute header field values as code
|
|
|
|
### 9.2 Denial of Service
|
|
|
|
The 16 MB header limit prevents memory exhaustion attacks. Implementations SHOULD:
|
|
|
|
- Reject headers before full allocation if length exceeds limit
|
|
- Implement timeouts for header parsing
|
|
- Limit recursion depth in JSON parsing
|
|
|
|
### 9.3 Path Traversal
|
|
|
|
Header fields like `filename` MUST NOT be used directly for filesystem operations without sanitization.
|
|
|
|
### 9.4 Checksum Security
|
|
|
|
- MD5 and SHA1 checksums provide integrity but not authenticity
|
|
- For tamper detection, use HMAC or digital signatures
|
|
- Checksum verification MUST use constant-time comparison
|
|
|
|
### 9.5 Version Negotiation
|
|
|
|
Implementations MUST NOT attempt to parse containers with unknown versions, as the format may change incompatibly.
|
|
|
|
## 10. IANA Considerations
|
|
|
|
This document does not require IANA actions. The TRIX format is application-defined and does not use IANA-managed namespaces.
|
|
|
|
Future versions may define:
|
|
- Media type registration (e.g., `application/x-trix`)
|
|
- Magic number registry
|
|
|
|
## 11. References
|
|
|
|
- [RFC 8259] The JavaScript Object Notation (JSON) Data Interchange Format
|
|
- [RFC 2119] Key words for use in RFCs to Indicate Requirement Levels
|
|
- [RFC 6838] Media Type Specifications and Registration Procedures
|
|
|
|
---
|
|
|
|
## Appendix A: Binary Layout Diagram
|
|
|
|
```
|
|
Byte offset: 0 4 5 9 9+H 9+H+P
|
|
|---------|----|---------|---------|---------|
|
|
| Magic | V | HdrLen | Header | Payload |
|
|
| (4) |(1) | (4) | (H) | (P) |
|
|
|---------|----|---------|---------|---------|
|
|
|
|
V = Version byte
|
|
H = Header length (from HdrLen field)
|
|
P = Payload length (remaining bytes)
|
|
```
|
|
|
|
## Appendix B: Reference Implementation
|
|
|
|
A reference implementation in Go is available at:
|
|
`github.com/Snider/Enchantrix/pkg/trix/trix.go`
|
|
|
|
## Appendix C: Changelog
|
|
|
|
- **2.0** (2025-01-13): Current version with JSON header
|
|
- **1.0** (deprecated): Initial version with fixed header fields
|