feat: add Claude Code plugin and CLAUDE.md documentation

Initial commit establishing core-agent repository with:
- Claude Code plugin hooks (safety checks, auto-formatting, context preservation)
- Collection skills for blockchain research archival (claude-cowork/)
- CLAUDE.md documenting repository structure and development patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Snider 2026-02-01 18:15:01 +00:00
commit 7faa974546
81 changed files with 8845 additions and 0 deletions

143
CLAUDE.md Normal file
View file

@ -0,0 +1,143 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
**core-agent** contains Claude Code plugins and data collection skills for the Host UK federated monorepo. It has two main components:
1. **claude/** - Claude Code plugin with hooks, commands, and automation scripts
2. **claude-cowork/** - Data collection skills for archiving blockchain/cryptocurrency research
## Repository Structure
```
core-agent/
├── claude/ # Claude Code plugin
│ ├── hooks/hooks.json # Hook definitions
│ ├── hooks/prefer-core.sh # PreToolUse: block dangerous commands
│ ├── scripts/ # Automation scripts
│ │ ├── pre-compact.sh # Save state before compaction
│ │ ├── session-start.sh # Restore context on startup
│ │ ├── php-format.sh # Auto-format PHP after edits
│ │ ├── go-format.sh # Auto-format Go after edits
│ │ └── check-debug.sh # Warn about debug statements
│ └── commands/
│ └── remember.md # /core:remember command
└── claude-cowork/ # Data collection skills
├── hooks/ # Collection event hooks
│ ├── hooks.json # Hook registration
│ └── dispatch.sh # Hook dispatcher
└── skills/ # Collection skills
├── ledger-papers/ # Whitepaper archive (91+ papers)
├── project-archaeology/ # Dead project excavation
├── bitcointalk/ # BitcoinTalk thread collection
├── coinmarketcap/ # Market data collection
├── github-history/ # Git history preservation
└── ... # Other collectors
```
## Claude Plugin Features
### Hooks
| Hook | File | Purpose |
|------|------|---------|
| PreToolUse | `prefer-core.sh` | Block destructive commands, enforce `core` CLI |
| PostToolUse | `php-format.sh` | Auto-format PHP files after edits |
| PostToolUse | `go-format.sh` | Auto-format Go files after edits |
| PostToolUse | `check-debug.sh` | Warn about debug statements |
| PreCompact | `pre-compact.sh` | Save state before compaction |
| SessionStart | `session-start.sh` | Restore context on startup |
### Blocked Commands
The plugin blocks these patterns to prevent accidental damage:
- `rm -rf` / `rm -r` (except node_modules, vendor, .cache)
- `mv`/`cp` with wildcards
- `xargs` with rm/mv/cp
- `find -exec` with file operations
- `sed -i` (in-place editing)
- `grep -l | ...` (mass file targeting)
- Raw `go` commands (use `core go *`)
- Raw `php artisan` / `composer test` (use `core php *`)
### Commands
- `/core:remember <fact>` - Save context that persists across compaction
### Context Preservation
State is saved to `~/.claude/sessions/` before compaction:
- Working directory and branch
- Git status (modified files)
- In-progress todos
- User-saved context facts
## Data Collection Skills
### ledger-papers
Archive of 91+ distributed ledger whitepapers across 15 categories (genesis, cryptonote, MRL, privacy, smart-contracts, etc).
```bash
./discover.sh --all # List all papers
./discover.sh --category=privacy # Filter by category
```
### project-archaeology
Excavates abandoned CryptoNote projects before data is lost.
```bash
./excavate.sh masari # Full dig
./excavate.sh masari --scan-only # Check what's accessible
```
### Other collectors
- `bitcointalk/` - BitcoinTalk thread archival
- `coinmarketcap/` - Historical price data
- `github-history/` - Repository history preservation
- `wallet-releases/` - Binary release archival
- `block-explorer/` - Blockchain data indexing
## Development
### Testing hooks locally
```bash
# Simulate PreToolUse hook input
echo '{"tool_input": {"command": "rm -rf /"}}' | bash ./claude/hooks/prefer-core.sh
```
### Adding new hooks
1. Add script to `claude/scripts/`
2. Register in `claude/hooks/hooks.json`
3. Test with simulated input
### Collection skill structure
Each skill follows this pattern:
```
skills/<name>/
├── SKILL.md # Documentation
├── discover.sh # Job generator (outputs URL|FILENAME|TYPE|METADATA)
├── process.sh # Job processor (optional)
└── registry.json # Data registry (optional)
```
## Coding Standards
- **UK English**: colour, organisation, centre
- **Shell scripts**: Use `#!/bin/bash`, read JSON with `jq`
- **Hook output**: JSON with `decision` (approve/block) and optional `message`
- **License**: EUPL-1.2 CIC
## Integration with Host UK
This plugin is designed for use across the Host UK federated monorepo. It enforces the `core` CLI for multi-repo operations instead of raw git/go/php commands. See the parent `/Users/snider/Code/host-uk/CLAUDE.md` for full monorepo documentation.

View file

@ -0,0 +1,90 @@
# Collection Hooks
Event-driven hooks that trigger during data collection.
## Available Hooks
| Hook | Trigger | Purpose |
|------|---------|---------|
| `collect-whitepaper.sh` | PDF/paper URL detected | Auto-queue whitepapers |
| `on-github-release.sh` | Release found | Archive release metadata |
| `on-explorer-block.sh` | Block data fetched | Index blockchain data |
## Hook Events
### `on_url_found`
Fired when a new URL is discovered during collection.
```bash
# Pattern matching
*.pdf → collect-whitepaper.sh
*/releases/* → on-github-release.sh
*/api/block/* → on-explorer-block.sh
```
### `on_file_collected`
Fired after a file is successfully downloaded.
```bash
# Post-processing
*.json → validate-json.sh
*.html → extract-links.sh
*.pdf → extract-metadata.sh
```
### `on_collection_complete`
Fired when a job batch finishes.
```bash
# Reporting
→ generate-index.sh
→ update-registry.sh
```
## Plugin Integration
For the marketplace plugin system:
```json
{
"name": "whitepaper-collector",
"version": "1.0.0",
"hooks": {
"on_url_found": {
"pattern": "*.pdf",
"handler": "./collect-whitepaper.sh"
}
}
}
```
## Registration
Hooks register in `hooks.json`:
```json
{
"on_url_found": [
{
"pattern": "\\.pdf$",
"handler": "./hooks/collect-whitepaper.sh",
"priority": 10
}
]
}
```
## Usage in Collectors
Collectors call hooks via:
```bash
# In job-collector/process.sh
source ./hooks/dispatch.sh
# When URL found
dispatch_hook "on_url_found" "$URL"
# When file collected
dispatch_hook "on_file_collected" "$FILE" "$TYPE"
```

View file

@ -0,0 +1,59 @@
#!/usr/bin/env bash
# Hook: collect-whitepaper.sh
# Called when a whitepaper URL is detected during collection
# Usage: ./collect-whitepaper.sh <URL> [destination-folder]
set -e
URL="$1"
DEST="${2:-./whitepapers}"
if [ -z "$URL" ]; then
echo "Usage: $0 <url> [destination]" >&2
exit 1
fi
# Detect paper type from URL
detect_category() {
local url="$1"
case "$url" in
*cryptonote*) echo "cryptonote" ;;
*iacr.org*|*eprint*) echo "research" ;;
*arxiv.org*) echo "research" ;;
*monero*|*getmonero*) echo "research" ;;
*lethean*|*lthn*) echo "lethean" ;;
*) echo "uncategorized" ;;
esac
}
# Generate safe filename from URL
safe_filename() {
local url="$1"
basename "$url" | sed 's/[^a-zA-Z0-9._-]/-/g'
}
CATEGORY=$(detect_category "$URL")
FILENAME=$(safe_filename "$URL")
TARGET_DIR="$DEST/$CATEGORY"
TARGET_FILE="$TARGET_DIR/$FILENAME"
mkdir -p "$TARGET_DIR"
# Check if already collected
if [ -f "$TARGET_FILE" ]; then
echo "Already collected: $TARGET_FILE"
exit 0
fi
echo "Collecting whitepaper:"
echo " URL: $URL"
echo " Category: $CATEGORY"
echo " Destination: $TARGET_FILE"
# Create job entry for proxy collection
echo "$URL|$FILENAME|whitepaper|category=$CATEGORY" >> "$DEST/.pending-jobs.txt"
echo "Job queued: $DEST/.pending-jobs.txt"
echo ""
echo "To collect immediately (if you have direct access):"
echo " curl -L -o '$TARGET_FILE' '$URL'"

80
claude-cowork/hooks/dispatch.sh Executable file
View file

@ -0,0 +1,80 @@
#!/usr/bin/env bash
# Hook dispatcher - source this in collectors
# Usage: source ./hooks/dispatch.sh
HOOKS_DIR="$(dirname "${BASH_SOURCE[0]}")"
HOOKS_JSON="$HOOKS_DIR/hooks.json"
# Dispatch a hook event
# dispatch_hook <event> <arg1> [arg2] ...
dispatch_hook() {
local event="$1"
shift
local args=("$@")
if [ ! -f "$HOOKS_JSON" ]; then
return 0
fi
# Get handlers for this event (requires jq)
if ! command -v jq &> /dev/null; then
echo "Warning: jq not installed, hooks disabled" >&2
return 0
fi
local handlers
handlers=$(jq -r ".hooks[\"$event\"][]? | select(.enabled == true) | @json" "$HOOKS_JSON" 2>/dev/null)
if [ -z "$handlers" ]; then
return 0
fi
echo "$handlers" | while read -r handler_json; do
local name pattern handler_script priority
name=$(echo "$handler_json" | jq -r '.name')
pattern=$(echo "$handler_json" | jq -r '.pattern // ""')
handler_script=$(echo "$handler_json" | jq -r '.handler')
# Check pattern match if pattern exists
if [ -n "$pattern" ] && [ -n "${args[0]}" ]; then
if ! echo "${args[0]}" | grep -qE "$pattern"; then
continue
fi
fi
# Execute handler
local full_path="$HOOKS_DIR/$handler_script"
if [ -x "$full_path" ]; then
echo "[hook] $name: ${args[*]}" >&2
"$full_path" "${args[@]}"
elif [ -f "$full_path" ]; then
echo "[hook] $name: ${args[*]}" >&2
bash "$full_path" "${args[@]}"
fi
done
}
# Register a new hook dynamically
# register_hook <event> <name> <pattern> <handler>
register_hook() {
local event="$1"
local name="$2"
local pattern="$3"
local handler="$4"
if ! command -v jq &> /dev/null; then
echo "Error: jq required for hook registration" >&2
return 1
fi
local new_hook
new_hook=$(jq -n \
--arg name "$name" \
--arg pattern "$pattern" \
--arg handler "$handler" \
'{name: $name, pattern: $pattern, handler: $handler, priority: 50, enabled: true}')
# Add to hooks.json
jq ".hooks[\"$event\"] += [$new_hook]" "$HOOKS_JSON" > "$HOOKS_JSON.tmp" \
&& mv "$HOOKS_JSON.tmp" "$HOOKS_JSON"
}

View file

@ -0,0 +1,45 @@
{
"version": "1.0.0",
"hooks": {
"on_url_found": [
{
"name": "whitepaper-collector",
"pattern": "\\.pdf$",
"handler": "./collect-whitepaper.sh",
"priority": 10,
"enabled": true
},
{
"name": "whitepaper-iacr",
"pattern": "eprint\\.iacr\\.org",
"handler": "./collect-whitepaper.sh",
"priority": 10,
"enabled": true
},
{
"name": "whitepaper-arxiv",
"pattern": "arxiv\\.org",
"handler": "./collect-whitepaper.sh",
"priority": 10,
"enabled": true
}
],
"on_file_collected": [
{
"name": "pdf-metadata",
"pattern": "\\.pdf$",
"handler": "./extract-pdf-metadata.sh",
"priority": 5,
"enabled": false
}
],
"on_collection_complete": [
{
"name": "update-index",
"handler": "./update-index.sh",
"priority": 100,
"enabled": true
}
]
}
}

View file

@ -0,0 +1,38 @@
#!/usr/bin/env bash
# Hook: update-index.sh
# Called after collection completes to update indexes
WHITEPAPERS_DIR="${1:-./whitepapers}"
echo "[update-index] Updating whitepaper index..."
# Count papers in each category
for category in cryptonote lethean research uncategorized; do
dir="$WHITEPAPERS_DIR/$category"
if [ -d "$dir" ]; then
count=$(find "$dir" -name "*.pdf" 2>/dev/null | wc -l | tr -d ' ')
echo " $category: $count papers"
fi
done
# Update INDEX.md with collected papers
INDEX="$WHITEPAPERS_DIR/INDEX.md"
if [ -f "$INDEX" ]; then
# Add collected papers section if not exists
if ! grep -q "## Recently Collected" "$INDEX"; then
echo "" >> "$INDEX"
echo "## Recently Collected" >> "$INDEX"
echo "" >> "$INDEX"
echo "_Last updated: $(date +%Y-%m-%d)_" >> "$INDEX"
echo "" >> "$INDEX"
fi
fi
# Process pending jobs
PENDING="$WHITEPAPERS_DIR/.pending-jobs.txt"
if [ -f "$PENDING" ]; then
count=$(wc -l < "$PENDING" | tr -d ' ')
echo "[update-index] $count papers queued for collection"
fi
echo "[update-index] Done"

View file

@ -0,0 +1,57 @@
# BitcoinTalk Thread Collector
Scrape and archive BitcoinTalk mega threads with author attribution and timestamps.
## Usage
```bash
# Single thread
./collect.sh https://bitcointalk.org/index.php?topic=2769739.0
# Just the topic ID
./collect.sh 2769739
# Limit pages (default: all)
./collect.sh 2769739 --pages=10
# Output to specific folder
./collect.sh 2769739 --output=./lethean-ann
```
## Output
```
bitcointalk-2769739/
├── thread.json # Full structured data
├── thread.md # Combined markdown
├── posts/
│ ├── POST-001.md # Individual posts
│ ├── POST-002.md
│ └── ...
└── INDEX.md # Thread overview + key posts
```
## Post Scoring
| Score | Meaning |
|-------|---------|
| ANN | Original announcement post |
| UPDATE | Official team update |
| QUESTION | Community question |
| ANSWER | Team response to question |
| SUPPORT | Positive community feedback |
| CONCERN | Raised issue/criticism |
| FUD | Identified as FUD/trolling |
| OFFTOPIC | Not relevant to project |
## Requirements
- `curl` or `wget`
- `pup` (HTML parser) or `python3` with beautifulsoup4
## Notes
- Respects rate limits (1 request per 2 seconds)
- Handles pagination automatically (.0, .20, .40, etc)
- Extracts: author, date, post rank, trust score, content
- Identifies team members vs community

View file

@ -0,0 +1,269 @@
#!/usr/bin/env bash
# BitcoinTalk Thread Collector
# Usage: ./collect.sh <topic-id-or-url> [--pages=N] [--output=DIR]
set -e
DELAY=2 # Be respectful to BTT servers
MAX_PAGES=0 # 0 = all pages
OUTPUT_BASE="."
# Parse topic ID from URL or direct input
parse_topic_id() {
local input="$1"
if [[ "$input" =~ topic=([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "$input" | grep -oE '[0-9]+'
fi
}
# Fetch a single page
fetch_page() {
local topic_id="$1"
local offset="$2"
local output_file="$3"
local url="https://bitcointalk.org/index.php?topic=${topic_id}.${offset}"
echo " Fetching: $url"
curl -s -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
-H "Accept: text/html" \
"$url" > "$output_file"
sleep $DELAY
}
# Check if page has posts
page_has_posts() {
local html_file="$1"
grep -q 'class="post"' "$html_file" 2>/dev/null
}
# Get last page number from first page
get_last_page() {
local html_file="$1"
# Look for navigation like "Pages: [1] 2 3 ... 50"
local max_page=$(grep -oE 'topic=[0-9]+\.[0-9]+' "$html_file" | \
sed 's/.*\.//' | sort -rn | head -1)
echo "${max_page:-0}"
}
# Extract posts from HTML (simplified - works for basic extraction)
extract_posts_simple() {
local html_file="$1"
local output_dir="$2"
local post_offset="$3"
# Use Python for reliable HTML parsing
python3 << PYEOF
import re
import html
import os
from datetime import datetime
html_content = open('$html_file', 'r', encoding='utf-8', errors='ignore').read()
# Pattern to find posts - BTT structure
post_pattern = r'<td class="td_headerandpost">(.*?)</td>\s*</tr>\s*</table>\s*</td>\s*</tr>'
author_pattern = r'<a href="https://bitcointalk\.org/index\.php\?action=profile;u=\d+"[^>]*>([^<]+)</a>'
date_pattern = r'<div class="smalltext">([A-Za-z]+ \d+, \d+, \d+:\d+:\d+ [AP]M)</div>'
post_content_pattern = r'<div class="post"[^>]*>(.*?)</div>\s*(?:<div class="moderatorbar"|</td>)'
posts = re.findall(post_pattern, html_content, re.DOTALL)
post_num = $post_offset
for post_html in posts:
post_num += 1
# Extract author
author_match = re.search(author_pattern, post_html)
author = author_match.group(1) if author_match else "Unknown"
# Extract date
date_match = re.search(date_pattern, post_html)
date_str = date_match.group(1) if date_match else "Unknown date"
# Extract content
content_match = re.search(post_content_pattern, post_html, re.DOTALL)
if content_match:
content = content_match.group(1)
# Clean HTML
content = re.sub(r'<br\s*/?>', '\n', content)
content = re.sub(r'<[^>]+>', '', content)
content = html.unescape(content)
content = content.strip()
else:
content = "(Could not extract content)"
# Determine post type/score
score = "COMMUNITY"
if post_num == 1:
score = "ANN"
elif re.search(r'\[UPDATE\]|\[RELEASE\]|\[ANNOUNCEMENT\]', content, re.I):
score = "UPDATE"
elif '?' in content[:200]:
score = "QUESTION"
# Write post file
filename = f"$output_dir/POST-{post_num:04d}.md"
with open(filename, 'w') as f:
f.write(f"# Post #{post_num}\n\n")
f.write(f"## Metadata\n\n")
f.write(f"| Field | Value |\n")
f.write(f"|-------|-------|\n")
f.write(f"| Author | {author} |\n")
f.write(f"| Date | {date_str} |\n")
f.write(f"| Type | **{score}** |\n\n")
f.write(f"---\n\n")
f.write(f"## Content\n\n")
f.write(content)
f.write(f"\n")
print(f" Created POST-{post_num:04d}.md ({score}) by {author}")
print(f"EXTRACTED:{post_num}")
PYEOF
}
# Main collection function
collect_thread() {
local topic_id="$1"
local output_dir="$OUTPUT_BASE/bitcointalk-$topic_id"
mkdir -p "$output_dir/pages" "$output_dir/posts"
echo "=== Collecting BitcoinTalk Topic: $topic_id ==="
# Fetch first page to get thread info
fetch_page "$topic_id" 0 "$output_dir/pages/page-0.html"
# Extract thread title
local title=$(grep -oP '<title>\K[^<]+' "$output_dir/pages/page-0.html" | head -1)
echo "Thread: $title"
# Get total pages
local last_offset=$(get_last_page "$output_dir/pages/page-0.html")
local total_pages=$(( (last_offset / 20) + 1 ))
echo "Total pages: $total_pages"
if [ "$MAX_PAGES" -gt 0 ] && [ "$MAX_PAGES" -lt "$total_pages" ]; then
total_pages=$MAX_PAGES
echo "Limiting to: $total_pages pages"
fi
# Extract posts from first page
local post_count=0
local result=$(extract_posts_simple "$output_dir/pages/page-0.html" "$output_dir/posts" 0)
post_count=$(echo "$result" | grep "EXTRACTED:" | cut -d: -f2)
# Fetch remaining pages
for (( page=1; page<total_pages; page++ )); do
local offset=$((page * 20))
fetch_page "$topic_id" "$offset" "$output_dir/pages/page-$offset.html"
if ! page_has_posts "$output_dir/pages/page-$offset.html"; then
echo " No more posts found, stopping."
break
fi
result=$(extract_posts_simple "$output_dir/pages/page-$offset.html" "$output_dir/posts" "$post_count")
post_count=$(echo "$result" | grep "EXTRACTED:" | cut -d: -f2)
done
# Generate index
generate_index "$output_dir" "$title" "$topic_id" "$post_count"
echo ""
echo "=== Collection Complete ==="
echo "Posts: $post_count"
echo "Output: $output_dir/"
}
# Generate index file
generate_index() {
local output_dir="$1"
local title="$2"
local topic_id="$3"
local post_count="$4"
cat > "$output_dir/INDEX.md" << EOF
# BitcoinTalk Thread Archive
## Thread Info
| Field | Value |
|-------|-------|
| Title | $title |
| Topic ID | $topic_id |
| URL | https://bitcointalk.org/index.php?topic=$topic_id.0 |
| Posts Archived | $post_count |
| Collected | $(date +%Y-%m-%d) |
---
## Post Type Legend
| Type | Meaning |
|------|---------|
| ANN | Original announcement |
| UPDATE | Official team update |
| QUESTION | Community question |
| ANSWER | Team response |
| COMMUNITY | General discussion |
| CONCERN | Raised issue/criticism |
---
## Posts
| # | Author | Date | Type |
|---|--------|------|------|
EOF
for file in "$output_dir/posts/"POST-*.md; do
[ -f "$file" ] || continue
local num=$(basename "$file" .md | sed 's/POST-0*//')
local author=$(grep "| Author |" "$file" | sed 's/.*| Author | \(.*\) |/\1/')
local date=$(grep "| Date |" "$file" | sed 's/.*| Date | \(.*\) |/\1/')
local type=$(sed -n '/| Type |/s/.*\*\*\([A-Z]*\)\*\*.*/\1/p' "$file")
echo "| [$num](posts/POST-$(printf "%04d" $num).md) | $author | $date | $type |" >> "$output_dir/INDEX.md"
done
echo " Created INDEX.md"
}
# Parse arguments
main() {
local topic_input=""
for arg in "$@"; do
case "$arg" in
--pages=*) MAX_PAGES="${arg#*=}" ;;
--output=*) OUTPUT_BASE="${arg#*=}" ;;
--delay=*) DELAY="${arg#*=}" ;;
*) topic_input="$arg" ;;
esac
done
if [ -z "$topic_input" ]; then
echo "Usage: $0 <topic-id-or-url> [--pages=N] [--output=DIR] [--delay=2]"
echo ""
echo "Examples:"
echo " $0 2769739"
echo " $0 https://bitcointalk.org/index.php?topic=2769739.0"
echo " $0 2769739 --pages=5 --output=./lethean-ann"
exit 1
fi
local topic_id=$(parse_topic_id "$topic_input")
if [ -z "$topic_id" ]; then
echo "Error: Could not parse topic ID from: $topic_input"
exit 1
fi
collect_thread "$topic_id"
}
main "$@"

View file

@ -0,0 +1,70 @@
# Block Explorer Collector
Archive blockchain data from CryptoNote block explorers.
## Data Available
| Data Type | Notes |
|-----------|-------|
| Genesis block | First block, network params |
| Block history | Height, timestamps, difficulty |
| Network stats | Hashrate, emission, supply |
| Transaction patterns | Volume, sizes, fees |
| Top addresses | Rich list (if available) |
## Common CryptoNote Explorer APIs
Most CryptoNote explorers expose similar JSON APIs:
```
/api/info # Network stats
/api/block/[height|hash] # Block data
/api/transaction/[hash] # Transaction data
/api/mempool # Pending transactions
/api/emission # Supply data
```
## Usage
```bash
# Generate jobs for known explorers
./generate-jobs.sh lethean > jobs.txt
# Custom explorer URL
./generate-jobs.sh --url=https://explorer.example.com > jobs.txt
# Get historical blocks (sampling)
./generate-jobs.sh lethean --blocks=1000 --sample=daily > jobs.txt
```
## Job Output
```
# API endpoints
https://explorer.lethean.io/api/info|explorer-lthn-info.json|explorer-api|coin=lethean
https://explorer.lethean.io/api/emission|explorer-lthn-emission.json|explorer-api|coin=lethean
https://explorer.lethean.io/api/block/1|explorer-lthn-block-1.json|explorer-api|coin=lethean,block=1
```
## Known Explorers
| Project | Explorer | API |
|---------|----------|-----|
| Lethean | explorer.lethean.io | ✅ |
| Monero | xmrchain.net | ✅ |
| Haven | explorer.havenprotocol.org | ✅ |
| Karbo | explorer.karbo.io | ✅ |
| Wownero | explore.wownero.com | ✅ |
## Archived Data
```
explorer-lethean/
├── info.json # Network summary
├── emission.json # Supply data
├── genesis.json # Block 0
├── blocks/
│ ├── monthly-samples.json # One block per month
│ └── milestones.json # Key heights
└── INDEX.md
```

View file

@ -0,0 +1,106 @@
#!/usr/bin/env bash
# Generate block explorer collection jobs
# Usage: ./generate-jobs.sh <coin> [--blocks=N] [--sample=daily|weekly|monthly]
set -e
COIN=""
EXPLORER_URL=""
SAMPLE="monthly"
BLOCK_COUNT=100
# Known explorers
declare -A EXPLORERS=(
["lethean"]="https://explorer.lethean.io"
["monero"]="https://xmrchain.net"
["haven"]="https://explorer.havenprotocol.org"
["karbo"]="https://explorer.karbo.io"
["wownero"]="https://explore.wownero.com"
["dero"]="https://explorer.dero.io"
["masari"]="https://explorer.getmasari.org"
["turtlecoin"]="https://explorer.turtlecoin.lol"
["conceal"]="https://explorer.conceal.network"
)
# Parse args
for arg in "$@"; do
case "$arg" in
--url=*) EXPLORER_URL="${arg#*=}" ;;
--blocks=*) BLOCK_COUNT="${arg#*=}" ;;
--sample=*) SAMPLE="${arg#*=}" ;;
--*) ;;
*) COIN="$arg" ;;
esac
done
if [ -z "$COIN" ] && [ -z "$EXPLORER_URL" ]; then
echo "Usage: $0 <coin> [--url=URL] [--blocks=N] [--sample=daily|weekly|monthly]" >&2
echo "" >&2
echo "Known coins: ${!EXPLORERS[*]}" >&2
exit 1
fi
# Get explorer URL
if [ -z "$EXPLORER_URL" ]; then
EXPLORER_URL="${EXPLORERS[$COIN]}"
if [ -z "$EXPLORER_URL" ]; then
echo "# ERROR: Unknown coin '$COIN'. Use --url= to specify explorer." >&2
exit 1
fi
fi
SLUG=$(echo "$COIN" | tr '[:upper:]' '[:lower:]')
echo "# Block Explorer Jobs for $COIN"
echo "# Explorer: $EXPLORER_URL"
echo "# Sample: $SAMPLE"
echo "# Format: URL|FILENAME|TYPE|METADATA"
echo "#"
# Core API endpoints
echo "# === Core Data ==="
echo "${EXPLORER_URL}/api/info|explorer-${SLUG}-info.json|explorer-api|coin=$SLUG,type=info"
echo "${EXPLORER_URL}/api/emission|explorer-${SLUG}-emission.json|explorer-api|coin=$SLUG,type=emission"
echo "${EXPLORER_URL}/api/supply|explorer-${SLUG}-supply.json|explorer-api|coin=$SLUG,type=supply"
echo "${EXPLORER_URL}/api/mempool|explorer-${SLUG}-mempool.json|explorer-api|coin=$SLUG,type=mempool"
# Genesis block
echo "#"
echo "# === Genesis Block ==="
echo "${EXPLORER_URL}/api/block/0|explorer-${SLUG}-block-0.json|explorer-api|coin=$SLUG,block=0"
echo "${EXPLORER_URL}/api/block/1|explorer-${SLUG}-block-1.json|explorer-api|coin=$SLUG,block=1"
# Milestone blocks (if we know the heights)
echo "#"
echo "# === Milestone Blocks ==="
for height in 10000 50000 100000 500000 1000000 2000000; do
echo "${EXPLORER_URL}/api/block/${height}|explorer-${SLUG}-block-${height}.json|explorer-api|coin=$SLUG,block=$height"
done
# Sample blocks by time
echo "#"
echo "# === Sampled Blocks (estimate heights) ==="
case "$SAMPLE" in
daily)
# ~720 blocks/day for 2-min blocks
STEP=720
;;
weekly)
STEP=5040
;;
monthly)
STEP=21600
;;
esac
for ((i=0; i<BLOCK_COUNT; i++)); do
height=$((i * STEP))
echo "${EXPLORER_URL}/api/block/${height}|explorer-${SLUG}-sample-${height}.json|explorer-api|coin=$SLUG,block=$height,sample=$SAMPLE"
done
# Web pages (for scraping if API fails)
echo "#"
echo "# === Web Pages (backup) ==="
echo "${EXPLORER_URL}/|explorer-${SLUG}-home.html|explorer-web|coin=$SLUG"
echo "${EXPLORER_URL}/blocks|explorer-${SLUG}-blocks.html|explorer-web|coin=$SLUG"
echo "${EXPLORER_URL}/stats|explorer-${SLUG}-stats.html|explorer-web|coin=$SLUG"

View file

@ -0,0 +1,64 @@
# CoinMarketCap Collector
Archive coin data, historical prices, and metadata from CoinMarketCap.
## Data Available
| Data Type | Source | Notes |
|-----------|--------|-------|
| Current price/market cap | Main page | Live data |
| Historical prices | /historical-data/ | OHLCV by date range |
| Project description | Main page | About section |
| Social links | Main page | Twitter, Discord, etc |
| Exchanges | /markets/ | Trading pairs |
| On-chain data | /onchain-analysis/ | If available |
| News mentions | /news/ | Related articles |
## Usage
### Generate Jobs
```bash
# All data for a coin
./generate-jobs.sh lethean > jobs.txt
# Just historical prices (date range)
./generate-jobs.sh lethean --historical --from=2018-01-01 --to=2024-12-31 > jobs.txt
# Multiple coins
./generate-jobs.sh lethean monero bitcoin > jobs.txt
```
### Process Downloads
```bash
./process.sh ./downloads/ --output=./cmc-archive/
```
## Output
```
cmc-lethean/
├── metadata.json # Name, symbol, links, description
├── current.json # Latest price/mcap/volume
├── historical/
│ ├── 2018.csv # OHLCV data
│ ├── 2019.csv
│ └── ...
├── markets.json # Exchange listings
└── INDEX.md # Summary
```
## Job Format
```
URL|FILENAME|TYPE|METADATA
https://coinmarketcap.com/currencies/lethean/|cmc-lethean-main.html|cmc-main|coin=lethean
https://coinmarketcap.com/currencies/lethean/historical-data/|cmc-lethean-historical.html|cmc-historical|coin=lethean
```
## Notes
- CMC has rate limiting - use delays
- Historical data may require pagination
- Some data behind API paywall - scrape public pages

View file

@ -0,0 +1,89 @@
#!/usr/bin/env bash
# Generate job list for CoinMarketCap collection
# Usage: ./generate-jobs.sh <coin-slug> [options] > jobs.txt
set -e
COINS=()
HISTORICAL=0
FROM_DATE="2017-01-01"
TO_DATE=$(date +%Y-%m-%d)
# Parse args
for arg in "$@"; do
case "$arg" in
--historical) HISTORICAL=1 ;;
--from=*) FROM_DATE="${arg#*=}" ;;
--to=*) TO_DATE="${arg#*=}" ;;
--*) ;;
*) COINS+=("$arg") ;;
esac
done
if [ ${#COINS[@]} -eq 0 ]; then
echo "Usage: $0 <coin-slug> [coin-slug...] [--historical] [--from=DATE] [--to=DATE]" >&2
echo "" >&2
echo "Examples:" >&2
echo " $0 lethean" >&2
echo " $0 lethean --historical --from=2018-01-01" >&2
echo " $0 lethean monero bitcoin" >&2
exit 1
fi
# Header
echo "# CoinMarketCap job list - $(date +%Y-%m-%d)"
echo "# Coins: ${COINS[*]}"
echo "# Format: URL|FILENAME|TYPE|METADATA"
echo "#"
for COIN in "${COINS[@]}"; do
SLUG=$(echo "$COIN" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]/-/g')
echo "# === $SLUG ==="
# Main page (current data, description, links)
echo "https://coinmarketcap.com/currencies/${SLUG}/|cmc-${SLUG}-main.html|cmc-main|coin=$SLUG"
# Markets/exchanges
echo "https://coinmarketcap.com/currencies/${SLUG}/markets/|cmc-${SLUG}-markets.html|cmc-markets|coin=$SLUG"
# Historical data page
echo "https://coinmarketcap.com/currencies/${SLUG}/historical-data/|cmc-${SLUG}-historical.html|cmc-historical|coin=$SLUG"
# News
echo "https://coinmarketcap.com/currencies/${SLUG}/news/|cmc-${SLUG}-news.html|cmc-news|coin=$SLUG"
# API endpoints (if accessible without auth)
# These return JSON and are more reliable than scraping
echo "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/detail?slug=${SLUG}|cmc-${SLUG}-api-detail.json|cmc-api|coin=$SLUG,type=detail"
echo "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/market-pairs/latest?slug=${SLUG}&limit=100|cmc-${SLUG}-api-markets.json|cmc-api|coin=$SLUG,type=markets"
# Historical data via API (may need date chunks)
if [ "$HISTORICAL" = "1" ]; then
echo "#"
echo "# Historical data: $FROM_DATE to $TO_DATE"
# Convert dates to timestamps
FROM_TS=$(date -j -f "%Y-%m-%d" "$FROM_DATE" "+%s" 2>/dev/null || date -d "$FROM_DATE" "+%s")
TO_TS=$(date -j -f "%Y-%m-%d" "$TO_DATE" "+%s" 2>/dev/null || date -d "$TO_DATE" "+%s")
# CMC historical API (public, limited)
echo "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?slug=${SLUG}&timeStart=${FROM_TS}&timeEnd=${TO_TS}|cmc-${SLUG}-api-historical.json|cmc-api|coin=$SLUG,type=historical"
# Also try the web scrape version with date range
echo "https://coinmarketcap.com/currencies/${SLUG}/historical-data/?start=${FROM_DATE//\-/}&end=${TO_DATE//\-/}|cmc-${SLUG}-historical-range.html|cmc-historical|coin=$SLUG,from=$FROM_DATE,to=$TO_DATE"
fi
echo "#"
done
echo "# === Additional data sources ==="
echo "#"
# CoinGecko as backup (often has more historical data)
for COIN in "${COINS[@]}"; do
SLUG=$(echo "$COIN" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]/-/g')
echo "https://www.coingecko.com/en/coins/${SLUG}|coingecko-${SLUG}-main.html|coingecko|coin=$SLUG"
echo "https://api.coingecko.com/api/v3/coins/${SLUG}|coingecko-${SLUG}-api.json|coingecko-api|coin=$SLUG"
echo "https://api.coingecko.com/api/v3/coins/${SLUG}/market_chart?vs_currency=usd&days=max|coingecko-${SLUG}-history.json|coingecko-api|coin=$SLUG,type=history"
done

View file

@ -0,0 +1,226 @@
#!/usr/bin/env bash
# Process downloaded CoinMarketCap data
# Usage: ./process.sh <downloads-dir> [--output=DIR]
set -e
DOWNLOADS="$1"
OUTPUT="./cmc-archive"
for arg in "$@"; do
case "$arg" in
--output=*) OUTPUT="${arg#*=}" ;;
esac
done
mkdir -p "$OUTPUT"
echo "=== Processing CoinMarketCap downloads ==="
# Process API JSON files first (most reliable)
for file in "$DOWNLOADS"/cmc-*-api-detail.json; do
[ -f "$file" ] || continue
COIN=$(basename "$file" | sed 's/cmc-\(.*\)-api-detail.json/\1/')
COIN_DIR="$OUTPUT/$COIN"
mkdir -p "$COIN_DIR"
echo "Processing: $COIN"
python3 << PYEOF
import json
import os
try:
data = json.load(open('$file', 'r'))
if 'data' in data:
coin = data['data']
# Extract metadata
metadata = {
'id': coin.get('id'),
'name': coin.get('name'),
'symbol': coin.get('symbol'),
'slug': coin.get('slug'),
'description': coin.get('description', ''),
'logo': coin.get('logo'),
'category': coin.get('category'),
'dateAdded': coin.get('dateAdded'),
'urls': coin.get('urls', {}),
'tags': coin.get('tags', []),
}
with open('$COIN_DIR/metadata.json', 'w') as f:
json.dump(metadata, f, indent=2)
print(f" Created metadata.json")
# Create markdown summary
with open('$COIN_DIR/INDEX.md', 'w') as f:
f.write(f"# {metadata['name']} ({metadata['symbol']})\n\n")
f.write(f"## Metadata\n\n")
f.write(f"| Field | Value |\n")
f.write(f"|-------|-------|\n")
f.write(f"| Name | {metadata['name']} |\n")
f.write(f"| Symbol | {metadata['symbol']} |\n")
f.write(f"| CMC ID | {metadata['id']} |\n")
f.write(f"| Added | {metadata['dateAdded']} |\n")
f.write(f"| Category | {metadata.get('category', 'N/A')} |\n\n")
f.write(f"## Links\n\n")
urls = metadata.get('urls', {})
for url_type, url_list in urls.items():
if url_list:
f.write(f"- **{url_type}**: {', '.join(url_list[:3])}\n")
f.write(f"\n## Description\n\n")
f.write(metadata.get('description', 'No description')[:2000])
f.write("\n")
print(f" Created INDEX.md")
except Exception as e:
print(f" Error processing: {e}")
PYEOF
done
# Process historical data
for file in "$DOWNLOADS"/cmc-*-api-historical.json; do
[ -f "$file" ] || continue
COIN=$(basename "$file" | sed 's/cmc-\(.*\)-api-historical.json/\1/')
COIN_DIR="$OUTPUT/$COIN"
mkdir -p "$COIN_DIR/historical"
echo "Processing historical: $COIN"
python3 << PYEOF
import json
import csv
from datetime import datetime
try:
data = json.load(open('$file', 'r'))
if 'data' in data and 'quotes' in data['data']:
quotes = data['data']['quotes']
# Group by year
by_year = {}
for quote in quotes:
ts = quote.get('timestamp', quote.get('time', ''))
if ts:
year = ts[:4]
if year not in by_year:
by_year[year] = []
by_year[year].append({
'date': ts[:10],
'open': quote.get('quote', {}).get('USD', {}).get('open', quote.get('open')),
'high': quote.get('quote', {}).get('USD', {}).get('high', quote.get('high')),
'low': quote.get('quote', {}).get('USD', {}).get('low', quote.get('low')),
'close': quote.get('quote', {}).get('USD', {}).get('close', quote.get('close')),
'volume': quote.get('quote', {}).get('USD', {}).get('volume', quote.get('volume')),
'market_cap': quote.get('quote', {}).get('USD', {}).get('market_cap', quote.get('market_cap')),
})
for year, rows in by_year.items():
filename = f'$COIN_DIR/historical/{year}.csv'
with open(filename, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['date', 'open', 'high', 'low', 'close', 'volume', 'market_cap'])
writer.writeheader()
writer.writerows(sorted(rows, key=lambda x: x['date']))
print(f" Created historical/{year}.csv ({len(rows)} rows)")
except Exception as e:
print(f" Error: {e}")
PYEOF
done
# Process CoinGecko data as backup
for file in "$DOWNLOADS"/coingecko-*-api.json; do
[ -f "$file" ] || continue
COIN=$(basename "$file" | sed 's/coingecko-\(.*\)-api.json/\1/')
COIN_DIR="$OUTPUT/$COIN"
mkdir -p "$COIN_DIR"
echo "Processing CoinGecko: $COIN"
python3 << PYEOF
import json
try:
data = json.load(open('$file', 'r'))
# Extract useful fields
gecko_data = {
'coingecko_id': data.get('id'),
'coingecko_rank': data.get('coingecko_rank'),
'genesis_date': data.get('genesis_date'),
'sentiment_up': data.get('sentiment_votes_up_percentage'),
'sentiment_down': data.get('sentiment_votes_down_percentage'),
'developer_data': data.get('developer_data', {}),
'community_data': data.get('community_data', {}),
}
with open('$COIN_DIR/coingecko.json', 'w') as f:
json.dump(gecko_data, f, indent=2)
print(f" Created coingecko.json")
except Exception as e:
print(f" Error: {e}")
PYEOF
done
# Process market/exchange data
for file in "$DOWNLOADS"/cmc-*-api-markets.json; do
[ -f "$file" ] || continue
COIN=$(basename "$file" | sed 's/cmc-\(.*\)-api-markets.json/\1/')
COIN_DIR="$OUTPUT/$COIN"
mkdir -p "$COIN_DIR"
echo "Processing markets: $COIN"
python3 << PYEOF
import json
try:
data = json.load(open('$file', 'r'))
if 'data' in data and 'marketPairs' in data['data']:
pairs = data['data']['marketPairs']
markets = []
for pair in pairs[:50]: # Top 50 markets
markets.append({
'exchange': pair.get('exchangeName'),
'pair': pair.get('marketPair'),
'price': pair.get('price'),
'volume_24h': pair.get('volumeUsd'),
'type': pair.get('marketType'),
})
with open('$COIN_DIR/markets.json', 'w') as f:
json.dump(markets, f, indent=2)
# Add to INDEX.md
with open('$COIN_DIR/INDEX.md', 'a') as f:
f.write(f"\n## Markets (Top 10)\n\n")
f.write(f"| Exchange | Pair | Volume 24h |\n")
f.write(f"|----------|------|------------|\n")
for m in markets[:10]:
vol = m.get('volume_24h', 0)
vol_str = f"${vol:,.0f}" if vol else "N/A"
f.write(f"| {m['exchange']} | {m['pair']} | {vol_str} |\n")
print(f" Created markets.json ({len(markets)} pairs)")
except Exception as e:
print(f" Error: {e}")
PYEOF
done
echo ""
echo "=== Processing Complete ==="
echo "Output: $OUTPUT/"

View file

@ -0,0 +1,85 @@
# Community Chat Collector
Archive Discord and Telegram community discussions.
## Challenges
| Platform | Access | Automation |
|----------|--------|------------|
| Discord | Bot token or user export | Discord.py, DiscordChatExporter |
| Telegram | User account or bot | Telethon, telegram-export |
## Tools
### Discord
- **DiscordChatExporter**: https://github.com/Tyrrrz/DiscordChatExporter
- GUI or CLI
- Exports to HTML, JSON, TXT, CSV
- Requires bot token or user token
### Telegram
- **telegram-export**: https://github.com/expectocode/telegram-export
- Python-based
- Exports messages, media, users
- Requires API credentials
## Manual Export
### Discord Data Request
1. User Settings → Privacy & Safety
2. Request all of my Data
3. Wait for email (can take days)
4. Download and extract
### Telegram Export
1. Desktop app → Settings → Advanced
2. Export Telegram Data
3. Select chats and data types
4. Download zip
## Usage
```bash
# Generate job list for manual processing
./generate-jobs.sh lethean > jobs.txt
# Process exported Discord data
./process-discord.sh ./discord-export/ --output=./chat-archive/
# Process exported Telegram data
./process-telegram.sh ./telegram-export/ --output=./chat-archive/
```
## Output
```
chat-archive/lethean/
├── discord/
│ ├── general/
│ │ ├── 2019.json
│ │ ├── 2020.json
│ │ └── ...
│ ├── development/
│ └── channels.json
├── telegram/
│ ├── main-group/
│ └── announcements/
└── INDEX.md
```
## Known Communities
### Lethean
- Discord: https://discord.gg/lethean
- Telegram: @labormarket (historical)
### Monero
- Multiple community discords
- IRC archives (Libera.chat)
## Notes
- Respect rate limits and ToS
- Some messages may be deleted - export doesn't get them
- Media files can be large - consider text-only first
- User privacy - consider anonymization for public archive

View file

@ -0,0 +1,91 @@
# CryptoNote Project Discovery
Discover and catalog CryptoNote-based projects for archival.
## Known CryptoNote Forks (2014-2024)
### Still Active
| Project | Symbol | Genesis | Status | Notable Features |
|---------|--------|---------|--------|------------------|
| Monero | XMR | 2014-04 | Active | RingCT, Bulletproofs |
| Haven | XHV | 2018-04 | Active | Synthetic assets |
| Wownero | WOW | 2018-04 | Active | Meme coin, RandomX |
| Dero | DERO | 2017-12 | Active | Smart contracts |
| Lethean | LTHN | 2017-10 | Active | dVPN/Proxy services |
| Karbo | KRB | 2016-05 | Active | Ukrainian community |
### Abandoned (Salvage Candidates)
| Project | Symbol | Genesis | Death | Reason | Salvageable |
|---------|--------|---------|-------|--------|-------------|
| Bytecoin | BCN | 2012-07 | 2022 | Premine scandal | Protocol research |
| Electroneum | ETN | 2017-09 | Pivot | Went mobile-only | Mobile wallet code |
| Aeon | AEON | 2014-06 | 2021 | Dev abandoned | Lightweight client |
| Masari | MSR | 2017-09 | 2022 | Dev MIA | Uncle mining |
| Loki | LOKI | 2018-03 | Rebrand | Now Session | Service nodes |
| Sumokoin | SUMO | 2017-04 | 2021 | Drama | Privacy features |
| Ryo | RYO | 2018-07 | 2023 | Low activity | GPU algo work |
| Conceal | CCX | 2018-01 | Low | Minimal dev | Banking features |
| Qwertycoin | QWC | 2018-01 | Low | Small team | Easy mining |
| TurtleCoin | TRTL | 2017-12 | 2023 | Team burnout | Community tools |
| Nerva | XNV | 2018-05 | 2022 | Solo mining only | Anti-pool algo |
## Data Sources Per Project
```
For each CryptoNote project, collect:
1. GitHub/GitLab repos
- Core daemon
- Wallet (CLI, GUI, mobile)
- Pool software
- Block explorer
- Documentation
2. BitcoinTalk ANN thread
- Original announcement
- Updates
- Community discussion
3. Block explorer
- Genesis block
- Emission curve
- Network stats history
4. CoinMarketCap/CoinGecko
- Price history
- Description
- Social links
5. Reddit/Discord
- Archived discussions
- Feature requests
6. Wayback Machine
- Old website versions
- Documentation snapshots
```
## Usage
```bash
# Discover all sources for a project
./discover.sh monero > monero-sources.txt
./discover.sh lethean > lethean-sources.txt
# Batch discover abandoned projects
./discover.sh --abandoned > salvage-targets.txt
# Generate collection jobs for all sources
./generate-all-jobs.sh lethean > lethean-jobs.txt
```
## Project Registry
The skill maintains a registry of known CryptoNote projects with:
- GitHub org/repos
- BitcoinTalk topic IDs
- Block explorer URLs
- CMC/CoinGecko slugs
- Social links
- Status (active/abandoned/dead)
- Notable innovations worth salvaging

View file

@ -0,0 +1,124 @@
#!/usr/bin/env bash
# Discover all collection sources for a CryptoNote project
# Usage: ./discover.sh <project-name> | ./discover.sh --abandoned | ./discover.sh --all
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REGISTRY="$SCRIPT_DIR/registry.json"
# Get project data from registry
get_project() {
local name="$1"
jq -r ".projects[] | select(.name | ascii_downcase == \"$(echo $name | tr '[:upper:]' '[:lower:]')\")" "$REGISTRY"
}
# List abandoned projects
list_abandoned() {
jq -r '.projects[] | select(.status == "abandoned" or .status == "low-activity" or .status == "dead") | .name' "$REGISTRY"
}
# List all projects
list_all() {
jq -r '.projects[].name' "$REGISTRY"
}
# Generate sources for a project
generate_sources() {
local name="$1"
local project=$(get_project "$name")
if [ -z "$project" ] || [ "$project" = "null" ]; then
echo "# ERROR: Project '$name' not found in registry" >&2
return 1
fi
local symbol=$(echo "$project" | jq -r '.symbol')
local status=$(echo "$project" | jq -r '.status')
echo "# === $name ($symbol) ==="
echo "# Status: $status"
echo "#"
# GitHub repos
echo "# GitHub Organizations:"
echo "$project" | jq -r '.github[]?' | while read org; do
[ -n "$org" ] && echo "github|https://github.com/$org|$name"
done
# BitcoinTalk
local btt=$(echo "$project" | jq -r '.bitcointalk // empty')
if [ -n "$btt" ]; then
echo "#"
echo "# BitcoinTalk:"
echo "bitcointalk|https://bitcointalk.org/index.php?topic=$btt.0|$name"
fi
# CMC/CoinGecko
local cmc=$(echo "$project" | jq -r '.cmc // empty')
local gecko=$(echo "$project" | jq -r '.coingecko // empty')
echo "#"
echo "# Market Data:"
[ -n "$cmc" ] && echo "cmc|https://coinmarketcap.com/currencies/$cmc/|$name"
[ -n "$gecko" ] && echo "coingecko|https://coingecko.com/en/coins/$gecko|$name"
# Website/Explorer
local website=$(echo "$project" | jq -r '.website // empty')
local explorer=$(echo "$project" | jq -r '.explorer // empty')
echo "#"
echo "# Web Properties:"
[ -n "$website" ] && echo "wayback|https://$website|$name"
[ -n "$explorer" ] && echo "explorer|https://$explorer|$name"
# Salvageable features
local salvage=$(echo "$project" | jq -r '.salvageable[]?' 2>/dev/null)
if [ -n "$salvage" ]; then
echo "#"
echo "# Salvageable:"
echo "$project" | jq -r '.salvageable[]?' | while read item; do
echo "# - $item"
done
fi
echo "#"
}
# Main
case "$1" in
--abandoned)
echo "# Abandoned CryptoNote Projects (Salvage Candidates)"
echo "# Format: source|url|project"
echo "#"
for proj in $(list_abandoned); do
generate_sources "$proj"
done
;;
--all)
echo "# All CryptoNote Projects"
echo "# Format: source|url|project"
echo "#"
for proj in $(list_all); do
generate_sources "$proj"
done
;;
--list)
list_all
;;
--list-abandoned)
list_abandoned
;;
"")
echo "Usage: $0 <project-name> | --abandoned | --all | --list" >&2
echo "" >&2
echo "Examples:" >&2
echo " $0 lethean # Sources for Lethean" >&2
echo " $0 monero # Sources for Monero" >&2
echo " $0 --abandoned # All abandoned projects" >&2
echo " $0 --all # Everything" >&2
echo " $0 --list # Just list project names" >&2
exit 1
;;
*)
generate_sources "$1"
;;
esac

View file

@ -0,0 +1,365 @@
{
"projects": [
{
"name": "Lethean",
"symbol": "LTHN",
"status": "active",
"genesis": "2017-10-06",
"github": ["LetheanNetwork", "letheanVPN", "LetheanMovement"],
"bitcointalk": "2769739",
"cmc": "lethean",
"coingecko": "lethean",
"website": "lethean.io",
"explorer": "explorer.lethean.io",
"features": ["dVPN", "Proxy services", "Service marketplace"],
"notes": "Originally IntenseCoin (ITNS). Pivoted to VPN/proxy services."
},
{
"name": "Monero",
"symbol": "XMR",
"status": "active",
"genesis": "2014-04-18",
"github": ["monero-project"],
"bitcointalk": "583449",
"cmc": "monero",
"coingecko": "monero",
"website": "getmonero.org",
"explorer": "xmrchain.net",
"features": ["RingCT", "Bulletproofs", "Dandelion++", "RandomX", "Difficulty adjustment algos", "Anti-botnet protections"],
"salvageable": ["Difficulty adjustment evolution", "RandomX anti-ASIC/botnet", "Block diff calculation iterations", "Network protection patterns"],
"notes": "Fork of Bytecoin. De facto CryptoNote reference implementation. Lethean shares fork heritage. Codebase messy but operationally battle-tested. Best-in-class difficulty system and botnet protection — track their algo evolution for reference."
},
{
"name": "Haven Protocol",
"symbol": "XHV",
"status": "dead",
"genesis": "2018-04-19",
"death_year": "2024",
"death_cause": "exploit",
"github": ["haven-protocol-org"],
"bitcointalk": "3039890",
"cmc": "haven-protocol",
"coingecko": "haven",
"website": "havenprotocol.org",
"explorer": "explorer.havenprotocol.org",
"features": ["Synthetic assets", "xUSD stable", "Private DeFi", "Offshore storage", "Mint/burn mechanics"],
"salvageable": ["xAsset stablecoin system", "Mint/burn implementation", "Offshore storage patterns", "Private synthetic assets", "Collateralization logic"],
"notes": "HAD WORKING CN STABLECOIN SYSTEM. Killed by exploit - someone unstaked millions via bug. Code wasn't bad, just unlucky. Bug is fixable. Directly relevant to Lethean's new chain escrow + POS + atomic swaps + sub-assets. HIGH PRIORITY SALVAGE for stablecoin architecture."
},
{
"name": "Zano",
"symbol": "ZANO",
"status": "active",
"genesis": "2019-05-01",
"github": ["hyle-team", "zanoio"],
"bitcointalk": "5144684",
"cmc": "zano",
"coingecko": "zano",
"website": "zano.org",
"explorer": "explorer.zano.org",
"features": ["CryptoNote v2", "ETH integration", "Escrow", "Hidden amount POS", "POW/POS hybrid", "Wallet aliases"],
"salvageable": ["Alias system (Lethean uses similar)", "Escrow implementation", "Hidden POS", "Hybrid consensus", "ETH bridge patterns"],
"notes": "Built by OG CryptoNote developer. CryptoNote v2 evolution. Wallet alias system is same pattern Lethean uses for naming. Active development, high reference value."
},
{
"name": "KevaCoin",
"symbol": "KVA",
"status": "active",
"genesis": "2018-12-01",
"github": ["kevacoin-project"],
"bitcointalk": "5104726",
"cmc": "kevacoin",
"coingecko": "kevacoin",
"website": "kevacoin.org",
"explorer": "explorer.kevacoin.org",
"features": ["Key-value storage", "On-chain data", "Decentralized namespace", "Arbitrary data storage"],
"salvageable": ["KV storage implementation", "Namespace system", "On-chain data patterns"],
"notes": "CryptoNote with key-value data storage on-chain. Decentralized namespace/database. Relevant to Lethean for on-chain service discovery metadata, SDP storage patterns."
},
{
"name": "Scala",
"symbol": "XLA",
"status": "active",
"genesis": "2018-04-01",
"github": ["scala-network"],
"bitcointalk": "3260965",
"cmc": "scala",
"coingecko": "scala",
"website": "scalaproject.io",
"explorer": "explorer.scalaproject.io",
"features": ["Mobile mining", "IPFS integration", "Diardi protocol", "ARM optimization"],
"salvageable": ["Mobile/ARM mining code", "IPFS integration patterns", "Diardi DHT protocol"],
"notes": "Mobile-first CryptoNote. Strong focus on ARM/mobile mining. IPFS integration for decentralized storage. Diardi protocol for DHT-based networking. Relevant to Lethean mobile client ambitions."
},
{
"name": "Dero (Current)",
"symbol": "DERO",
"status": "active",
"genesis": "2017-12-01",
"github": ["deroproject"],
"bitcointalk": "2525360",
"cmc": "dero",
"coingecko": "dero",
"website": "dero.io",
"explorer": "explorer.dero.io",
"features": ["Smart contracts", "Homomorphic encryption", "DAG"],
"notes": "Captain rewrote from scratch in Go with DAG. NOT CryptoNote anymore. See Dero Classic for original."
},
{
"name": "Dero Classic",
"symbol": "DERO",
"status": "abandoned",
"genesis": "2017-12-01",
"death_year": "2019",
"github": ["deroproject"],
"github_branch": "master (pre-atlantis)",
"bitcointalk": "2525360",
"features": ["Original CryptoNote base", "Early smart contract experiments", "Pre-Go architecture"],
"salvageable": ["Original CN daemon", "Early SC implementation attempts", "C++ codebase before Go rewrite"],
"notes": "The ORIGINAL Dero before Captain rewrote everything in Go. This is the CryptoNote version. Need to find archived branches/tags."
},
{
"name": "Karbo",
"symbol": "KRB",
"status": "active",
"genesis": "2016-05-30",
"github": ["Karbovanets"],
"bitcointalk": "1491212",
"cmc": "karbo",
"coingecko": "karbo",
"website": "karbo.io",
"explorer": "explorer.karbo.io",
"features": ["Ukrainian focus", "Payment processor"],
"notes": "Strong Ukrainian community. Survived through wars."
},
{
"name": "Wownero",
"symbol": "WOW",
"status": "active",
"genesis": "2018-04-01",
"github": ["wownero"],
"bitcointalk": "3104527",
"cmc": "wownero",
"coingecko": "wownero",
"website": "wownero.org",
"explorer": "explore.wownero.com",
"features": ["Meme coin", "RandomX", "No premine"],
"notes": "Monero meme fork. Good testbed for new features."
},
{
"name": "TurtleCoin",
"symbol": "TRTL",
"status": "abandoned",
"genesis": "2017-12-09",
"github": ["turtlecoin"],
"bitcointalk": "2689892",
"cmc": "turtlecoin",
"coingecko": "turtlecoin",
"website": "turtlecoin.lol",
"features": ["Fast blocks", "Low fees", "Fun community", "Karai sidechain"],
"salvageable": ["Community tools", "Wallet backends", "Pool software", "Educational docs"],
"notes": "Team burned out 2023. Excellent beginner-friendly docs and tools."
},
{
"name": "Masari",
"symbol": "MSR",
"status": "abandoned",
"genesis": "2017-09-02",
"github": ["masari-project"],
"bitcointalk": "2145262",
"cmc": "masari",
"coingecko": "masari",
"website": "getmasari.org",
"features": ["Uncle mining (SECOR)", "WHM difficulty algo", "Blocktree"],
"salvageable": ["Uncle mining code", "SECOR implementation", "WHM difficulty"],
"notes": "Dev went MIA. Uncle mining was innovative - reduces orphans."
},
{
"name": "Aeon",
"symbol": "AEON",
"status": "abandoned",
"genesis": "2014-06-06",
"github": ["aeonix"],
"bitcointalk": "641696",
"cmc": "aeon",
"coingecko": "aeon",
"website": "aeon.cash",
"features": ["Lightweight", "Pruning", "Mobile-friendly"],
"salvageable": ["Lightweight sync", "Pruning code", "Mobile optimizations"],
"notes": "Aimed to be mobile Monero. Dev abandoned. Pruning work valuable."
},
{
"name": "Loki",
"symbol": "LOKI",
"status": "rebranded",
"new_name": "Oxen/Session",
"genesis": "2018-03-20",
"github": ["oxen-io", "loki-project"],
"bitcointalk": "3073073",
"cmc": "oxen",
"coingecko": "loki-network",
"website": "oxen.io",
"features": ["Service nodes", "Staking", "Lokinet", "Session messenger"],
"salvageable": ["Service node architecture", "Staking implementation", "Sybil resistance", "Lokinet onion routing", "Pre-Session messenger (Loki Messenger)"],
"notes": "LOKI CODE valuable. Oxen drifted from CryptoNote - focus on pre-rebrand commits. Service node incentive model directly relevant to Lethean exit nodes. HAD MESSENGER before Session rebrand - encrypted comms over service nodes."
},
{
"name": "GraftNetwork",
"symbol": "GRFT",
"status": "abandoned",
"genesis": "2018-01-01",
"death_year": "2020",
"github": ["graft-project", "graft-community"],
"bitcointalk": "2766943",
"cmc": "graft-blockchain",
"coingecko": "graft-blockchain",
"website": "graft.network",
"features": ["Supernodes (masternodes)", "Real-time authorization", "Point-of-sale terminal", "Payment network", "Veriphone integration"],
"salvageable": ["Supernode architecture", "RTA (real-time auth) protocol", "POS terminal app", "Mesh payment routing", "Masternode incentive model"],
"notes": "HAD WORKING VERIPHONE TERMINAL APP pre-crypto winter. Distributed payment network using masternodes on CryptoNote. Mesh routing code extremely relevant to Lethean service discovery. Died in crypto winter but tech was solid."
},
{
"name": "Nerva",
"symbol": "XNV",
"status": "abandoned",
"genesis": "2018-05-01",
"github": ["nerva-project"],
"bitcointalk": "3464367",
"cmc": "nerva",
"coingecko": "nerva",
"website": "nerva.one",
"features": ["Solo mining only", "Anti-pool", "CPU only"],
"salvageable": ["Anti-pool algorithm", "Solo mining incentives"],
"notes": "Forced solo mining to decentralize. Interesting approach."
},
{
"name": "Conceal",
"symbol": "CCX",
"status": "low-activity",
"genesis": "2018-01-01",
"github": ["ConcealNetwork"],
"bitcointalk": "2779530",
"cmc": "conceal",
"coingecko": "conceal",
"website": "conceal.network",
"features": ["Banking", "Deposits", "Interest"],
"salvageable": ["Deposit/interest system", "Banking features"],
"notes": "DeFi-like features before DeFi was cool. Low activity now."
},
{
"name": "Ryo Currency",
"symbol": "RYO",
"status": "low-activity",
"genesis": "2018-07-08",
"github": ["ryo-currency"],
"bitcointalk": "4549406",
"cmc": "ryo-currency",
"coingecko": "ryo-currency",
"website": "ryo-currency.com",
"features": ["GPU algo research", "Cryptonight-GPU"],
"salvageable": ["GPU algorithm work", "Mining research"],
"notes": "Focused on GPU mining fairness research."
},
{
"name": "Sumokoin",
"symbol": "SUMO",
"status": "abandoned",
"genesis": "2017-04-25",
"github": ["sumoprojects"],
"bitcointalk": "1893253",
"cmc": "sumokoin",
"coingecko": "sumokoin",
"website": "sumokoin.org",
"features": ["Larger ring size", "More privacy"],
"salvageable": ["Larger ring research"],
"notes": "Aimed for more privacy than Monero. Team drama killed it."
},
{
"name": "Bytecoin",
"symbol": "BCN",
"status": "dead",
"genesis": "2012-07-04",
"github": ["bcndev"],
"bitcointalk": "512747",
"cmc": "bytecoin-bcn",
"coingecko": "bytecoin",
"website": "bytecoin.org",
"features": ["Original CryptoNote", "First implementation"],
"salvageable": ["Historical reference", "Original protocol docs"],
"notes": "The original. Premine scandal. Historical importance only."
},
{
"name": "Electroneum",
"symbol": "ETN",
"status": "pivoted",
"genesis": "2017-09-14",
"github": ["electroneum"],
"bitcointalk": "2098160",
"cmc": "electroneum",
"coingecko": "electroneum",
"website": "electroneum.com",
"features": ["Mobile mining", "KYC integration", "App payments"],
"salvageable": ["Mobile mining simulation", "App integration patterns"],
"notes": "Went full mobile/KYC. Not really CryptoNote anymore. ICO money."
},
{
"name": "QRL",
"symbol": "QRL",
"status": "active",
"genesis": "2018-06-26",
"github": ["theQRL"],
"bitcointalk": "1730477",
"cmc": "quantum-resistant-ledger",
"coingecko": "quantum-resistant-ledger",
"website": "theqrl.org",
"explorer": "explorer.theqrl.org",
"features": ["XMSS signatures", "Post-quantum cryptography", "Lattice-based crypto", "Future-proof addresses"],
"salvageable": ["XMSS implementation", "Post-quantum signature schemes", "Quantum-safe address formats", "PQ cryptography research"],
"cryptonote": false,
"notes": "NOT CryptoNote - but quantum resistance research is essential for future-proofing. XMSS and lattice-based cryptography. Whitepapers valuable for when quantum computing threatens current CN signature schemes."
},
{
"name": "Hyperswarm / Holepunch",
"symbol": null,
"status": "active",
"github": ["hyperswarm", "holepunchto"],
"website": "holepunch.to",
"features": ["DHT networking", "NAT hole punching", "P2P connections", "Hypercore protocol", "No token"],
"salvageable": ["DHT implementation", "Hole punching code", "P2P discovery patterns", "Decentralized networking stack"],
"cryptonote": false,
"token": false,
"notes": "PURE TECH, NO TOKEN. Mafintosh and crew. Best-in-class P2P infrastructure. DHT-based peer discovery, NAT traversal, decentralized networking. Directly relevant to Lethean service discovery and mesh networking. Reference implementation for how to do P2P right."
},
{
"name": "Hive",
"symbol": "HIVE",
"status": "active",
"genesis": "2020-03-20",
"github": ["openhive-network"],
"website": "hive.io",
"explorer": "hiveblocks.com",
"features": ["Social blockchain", "Non-mintable block types", "Prefix-based filtering", "On-chain messaging", "Custom JSON ops"],
"salvageable": ["Prefix-based message routing", "On-chain pub/sub pattern", "Encrypted namespace messaging", "Custom operation types"],
"cryptonote": false,
"notes": "Steem fork. Has non-mintable block type with prefix system — listen to your prefix, decrypt, done. Almost used for Lethean comms layer. Elegant on-chain messaging without separate infra. Reference for encrypted pub/sub patterns."
},
{
"name": "Octa.Space",
"symbol": "OCTA",
"status": "active",
"github": ["octa-space"],
"website": "octa.space",
"features": ["Decentralized compute", "VPS rental", "GPU marketplace", "Distributed cloud", "Node hosting rewards"],
"salvageable": ["VPS provisioning patterns", "Compute marketplace model", "Node incentive structure", "Resource metering"],
"cryptonote": false,
"notes": "dCloud / decentralized VPS marketplace. Relevant to Lethean for compute-as-a-service patterns beyond just VPN. Compare to Lethean exit node model but for general compute. VPS-type deals on decentralized infrastructure."
}
],
"metadata": {
"last_updated": "2026-02-01",
"maintained_by": ["Snider", "Darbs"],
"purpose": "CryptoNote ecosystem preservation",
"high_priority_salvage": ["Haven Protocol", "GraftNetwork", "Dero Classic", "Loki (pre-Oxen)", "Masari"],
"notes": "Focus on projects with service node/masternode/mesh architectures relevant to Lethean"
}
}

View file

@ -0,0 +1,137 @@
# GitHub History Collection Skill
Collect and score GitHub issues and PRs for triage analysis.
## Usage
```bash
# Single repo
./collect.sh https://github.com/LetheanNetwork/lthn-app-vpn
# Entire org (all repos)
./collect.sh https://github.com/LetheanNetwork --org
# Just issues (skip PRs)
./collect.sh https://github.com/LetheanNetwork/lthn-app-vpn --issues-only
# Just PRs (skip issues)
./collect.sh https://github.com/LetheanNetwork/lthn-app-vpn --prs-only
# Custom rate limit delay
./collect.sh https://github.com/LetheanNetwork --org --delay=0.5
```
## Output Structure
```
repo/
├── {org}/
│ └── {repo}/
│ ├── Issue/
│ │ ├── 001.md # Sequential, no gaps
│ │ ├── 002.md
│ │ ├── 003.md
│ │ └── INDEX.md # Scored index
│ ├── PR/
│ │ ├── 001.md
│ │ ├── 002.md
│ │ └── INDEX.md
│ └── .json/ # Raw API responses
│ ├── issues-list.json
│ ├── issue-{n}.json
│ ├── prs-list.json
│ └── pr-{n}.json
```
### Sequential vs GitHub Numbers
- **Filename**: `001.md`, `002.md`, etc. - sequential, no gaps
- **Inside file**: `# Issue #47: ...` - preserves original GitHub number
- **INDEX.md**: Maps both: `| 001 | #47 | Title | SCORE |`
This ensures clean sequential browsing while maintaining traceability to GitHub.
## Reception Scores
| Score | Meaning | Triage Action |
|-------|---------|---------------|
| ADDRESSED | Closed after discussion | Review if actually fixed |
| DISMISSED | Labeled wontfix/invalid | **RECLAIM candidate** |
| IGNORED | Closed, no response | **RECLAIM candidate** |
| STALE | Open, no replies | Needs attention |
| ACTIVE | Open with discussion | In progress |
| MERGED | PR accepted | Done |
| REJECTED | PR closed unmerged | Review why |
| PENDING | PR still open | Needs review |
## Requirements
- `gh` CLI authenticated (`gh auth login`)
- `jq` installed
## Batch Collection
Supports comma-separated targets for batch runs:
```bash
# Batch orgs
./collect.sh "LetheanNetwork,graft-project,oxen-io" --org
# Batch repos
./collect.sh "LetheanNetwork/lthn-app-vpn,monero-project/monero"
```
## Full Registry List
Copy-paste ready commands for the complete CryptoNote ecosystem:
```bash
# === LETHEAN ECOSYSTEM ===
./collect.sh "LetheanNetwork,letheanVPN,LetheanMovement" --org
# === CRYPTONOTE ACTIVE ===
./collect.sh "monero-project,hyle-team,zanoio,kevacoin-project,scala-network" --org
./collect.sh "Karbovanets,wownero,ConcealNetwork,ryo-currency" --org
# === SALVAGE PRIORITY (dead/abandoned) ===
./collect.sh "haven-protocol-org,graft-project,graft-community" --org
./collect.sh "oxen-io,loki-project" --org
./collect.sh "turtlecoin,masari-project,aeonix,nerva-project,sumoprojects" --org
./collect.sh "deroproject,bcndev,electroneum" --org
# === NON-CN REFERENCE ===
./collect.sh "theQRL,hyperswarm,holepunchto,openhive-network,octa-space" --org
```
### One-liner for everything
```bash
./collect.sh "LetheanNetwork,letheanVPN,LetheanMovement,monero-project,haven-protocol-org,hyle-team,zanoio,kevacoin-project,scala-network,deroproject,Karbovanets,wownero,turtlecoin,masari-project,aeonix,oxen-io,loki-project,graft-project,graft-community,nerva-project,ConcealNetwork,ryo-currency,sumoprojects,bcndev,electroneum,theQRL,hyperswarm,holepunchto,openhive-network,octa-space" --org
```
## Example Run
```bash
$ ./collect.sh "LetheanNetwork,graft-project" --org
=== Collecting all repos from org: LetheanNetwork ===
=== Collecting: LetheanNetwork/lthn-app-vpn ===
Output: ./repo/LetheanNetwork/lthn-app-vpn/
Fetching issues...
Found 145 issues
Fetching issue #1 -> 001.md
...
Created Issue/INDEX.md
Fetching PRs...
Found 98 PRs
...
Created PR/INDEX.md
=== Collecting all repos from org: graft-project ===
=== Collecting: graft-project/graft-network ===
Output: ./repo/graft-project/graft-network/
...
=== Collection Complete ===
Output: ./repo/
```

View file

@ -0,0 +1,516 @@
#!/usr/bin/env bash
# GitHub History Collector v2
# Usage: ./collect.sh <target> [--org] [--issues-only] [--prs-only]
#
# Supports:
# Single repo: ./collect.sh LetheanNetwork/lthn-app-vpn
# Single org: ./collect.sh LetheanNetwork --org
# Batch orgs: ./collect.sh "LetheanNetwork,graft-project,oxen-io" --org
# Batch repos: ./collect.sh "owner/repo1,owner/repo2"
#
# Output structure:
# repo/{org}/{repo}/Issue/001.md, 002.md, ...
# repo/{org}/{repo}/PR/001.md, 002.md, ...
#
# Rate limiting:
# --check-rate Just show current rate limit status and exit
# Auto-pauses at 25% remaining (75% used) until reset+10s (preserves GraphQL quota)
set -e
# GitHub API allows 5000 requests/hour authenticated
# 0.05s = 20 req/sec = safe margin, bump to 0.1 if rate limited
DELAY=0.05
OUTPUT_BASE="./repo"
# Rate limit protection - check every N calls, pause if under 25% (75% used)
API_CALL_COUNT=0
RATE_CHECK_INTERVAL=100
check_rate_limit() {
local rate_json=$(gh api rate_limit 2>/dev/null)
if [ -z "$rate_json" ]; then
echo " [Rate check failed, continuing...]"
return
fi
local remaining=$(echo "$rate_json" | jq -r '.resources.core.remaining')
local limit=$(echo "$rate_json" | jq -r '.resources.core.limit')
local reset=$(echo "$rate_json" | jq -r '.resources.core.reset')
local percent=$((remaining * 100 / limit))
echo ""
echo ">>> Rate check: ${percent}% remaining ($remaining/$limit)"
if [ "$percent" -lt 25 ]; then
local now=$(date +%s)
local wait_time=$((reset - now + 10))
if [ "$wait_time" -gt 0 ]; then
local resume_time=$(date -d "@$((reset + 10))" '+%H:%M:%S' 2>/dev/null || date -r "$((reset + 10))" '+%H:%M:%S' 2>/dev/null || echo "reset+10s")
echo ">>> Under 25% - pausing ${wait_time}s until $resume_time"
echo ">>> (GraphQL quota preserved for other tools)"
sleep "$wait_time"
echo ">>> Resuming collection..."
fi
else
echo ">>> Above 25% - continuing..."
fi
echo ""
}
track_api_call() {
API_CALL_COUNT=$((API_CALL_COUNT + 1))
if [ $((API_CALL_COUNT % RATE_CHECK_INTERVAL)) -eq 0 ]; then
check_rate_limit
fi
}
# Parse URL into org/repo
parse_github_url() {
local url="$1"
url="${url#https://github.com/}"
url="${url#http://github.com/}"
url="${url%/}"
echo "$url"
}
# Collect single repo
collect_repo() {
local repo="$1" # format: org/repo-name
local org=$(dirname "$repo")
local repo_name=$(basename "$repo")
local issue_dir="$OUTPUT_BASE/$org/$repo_name/Issue"
local pr_dir="$OUTPUT_BASE/$org/$repo_name/PR"
local json_dir="$OUTPUT_BASE/$org/$repo_name/.json"
mkdir -p "$issue_dir" "$pr_dir" "$json_dir"
echo "=== Collecting: $repo ==="
echo " Output: $OUTPUT_BASE/$org/$repo_name/"
# Collect Issues
if [ "$SKIP_ISSUES" != "1" ]; then
echo "Fetching issues..."
if ! gh issue list --repo "$repo" --state all --limit 500 \
--json number,title,state,author,labels,createdAt,closedAt,body \
> "$json_dir/issues-list.json" 2>/dev/null; then
echo " (issues disabled or not accessible)"
echo "[]" > "$json_dir/issues-list.json"
fi
track_api_call
local issue_count=$(jq length "$json_dir/issues-list.json")
echo " Found $issue_count issues"
# Fetch each issue
local seq=0
for github_num in $(jq -r '.[].number' "$json_dir/issues-list.json" | sort -n); do
seq=$((seq + 1))
local seq_padded=$(printf '%03d' $seq)
# Skip if already fetched
if [ -f "$json_dir/issue-$github_num.json" ] && [ -f "$issue_dir/$seq_padded.md" ]; then
echo " Skipping issue #$github_num (already exists)"
continue
fi
echo " Fetching issue #$github_num -> $seq_padded.md"
gh issue view "$github_num" --repo "$repo" \
--json number,title,state,author,labels,createdAt,closedAt,body,comments \
> "$json_dir/issue-$github_num.json"
track_api_call
# Convert to markdown with sequential filename
convert_issue "$json_dir/issue-$github_num.json" "$issue_dir/$seq_padded.md" "$github_num"
sleep $DELAY
done
generate_issue_index "$issue_dir"
fi
# Collect PRs
if [ "$SKIP_PRS" != "1" ]; then
echo "Fetching PRs..."
if ! gh pr list --repo "$repo" --state all --limit 500 \
--json number,title,state,author,createdAt,closedAt,mergedAt,body \
> "$json_dir/prs-list.json" 2>/dev/null; then
echo " (PRs disabled or not accessible)"
echo "[]" > "$json_dir/prs-list.json"
fi
track_api_call
local pr_count=$(jq length "$json_dir/prs-list.json")
echo " Found $pr_count PRs"
# Fetch each PR
local seq=0
for github_num in $(jq -r '.[].number' "$json_dir/prs-list.json" | sort -n); do
seq=$((seq + 1))
local seq_padded=$(printf '%03d' $seq)
# Skip if already fetched
if [ -f "$json_dir/pr-$github_num.json" ] && [ -f "$pr_dir/$seq_padded.md" ]; then
echo " Skipping PR #$github_num (already exists)"
continue
fi
echo " Fetching PR #$github_num -> $seq_padded.md"
gh pr view "$github_num" --repo "$repo" \
--json number,title,state,author,createdAt,closedAt,mergedAt,body,comments,reviews \
> "$json_dir/pr-$github_num.json" 2>/dev/null || true
track_api_call
# Convert to markdown with sequential filename
convert_pr "$json_dir/pr-$github_num.json" "$pr_dir/$seq_padded.md" "$github_num"
sleep $DELAY
done
generate_pr_index "$pr_dir"
fi
}
# Collect all repos in org
collect_org() {
local org="$1"
echo "=== Collecting all repos from org: $org ==="
# Get repo list (1 API call)
local repos
repos=$(gh repo list "$org" --limit 500 --json nameWithOwner -q '.[].nameWithOwner')
track_api_call
while read -r repo; do
[ -n "$repo" ] || continue
collect_repo "$repo"
sleep $DELAY
done <<< "$repos"
}
# Convert issue JSON to markdown
convert_issue() {
local json_file="$1"
local output_file="$2"
local github_num="$3"
local title=$(jq -r '.title' "$json_file")
local state=$(jq -r '.state' "$json_file")
local author=$(jq -r '.author.login' "$json_file")
local created=$(jq -r '.createdAt' "$json_file" | cut -d'T' -f1)
local closed=$(jq -r '.closedAt // "N/A"' "$json_file" | cut -d'T' -f1)
local body=$(jq -r '.body // "No description"' "$json_file")
local labels=$(jq -r '[.labels[].name] | join(", ")' "$json_file")
local comment_count=$(jq '.comments | length' "$json_file")
# Score reception
local score="UNKNOWN"
local reason=""
if [ "$state" = "CLOSED" ]; then
if echo "$labels" | grep -qi "wontfix\|invalid\|duplicate\|won't fix"; then
score="DISMISSED"
reason="Labeled as wontfix/invalid/duplicate"
elif [ "$comment_count" -eq 0 ]; then
score="IGNORED"
reason="Closed with no discussion"
else
score="ADDRESSED"
reason="Closed after discussion"
fi
else
if [ "$comment_count" -eq 0 ]; then
score="STALE"
reason="Open with no response"
else
score="ACTIVE"
reason="Open with discussion"
fi
fi
cat > "$output_file" << ISSUE_EOF
# Issue #$github_num: $title
## Reception Score
| Score | Reason |
|-------|--------|
| **$score** | $reason |
---
## Metadata
| Field | Value |
|-------|-------|
| GitHub # | $github_num |
| State | $state |
| Author | @$author |
| Created | $created |
| Closed | $closed |
| Labels | $labels |
| Comments | $comment_count |
---
## Original Post
**Author:** @$author
$body
---
## Discussion Thread
ISSUE_EOF
jq -r '.comments[] | "### Comment by @\(.author.login)\n\n**Date:** \(.createdAt | split("T")[0])\n\n\(.body)\n\n---\n"' "$json_file" >> "$output_file" 2>/dev/null || true
}
# Convert PR JSON to markdown
convert_pr() {
local json_file="$1"
local output_file="$2"
local github_num="$3"
[ -f "$json_file" ] || return
local title=$(jq -r '.title' "$json_file")
local state=$(jq -r '.state' "$json_file")
local author=$(jq -r '.author.login' "$json_file")
local created=$(jq -r '.createdAt' "$json_file" | cut -d'T' -f1)
local merged=$(jq -r '.mergedAt // "N/A"' "$json_file" | cut -d'T' -f1)
local body=$(jq -r '.body // "No description"' "$json_file")
local score="UNKNOWN"
local reason=""
if [ "$state" = "MERGED" ] || { [ "$merged" != "N/A" ] && [ "$merged" != "null" ]; }; then
score="MERGED"
reason="Contribution accepted"
elif [ "$state" = "CLOSED" ]; then
score="REJECTED"
reason="PR closed without merge"
else
score="PENDING"
reason="Still open"
fi
cat > "$output_file" << PR_EOF
# PR #$github_num: $title
## Reception Score
| Score | Reason |
|-------|--------|
| **$score** | $reason |
---
## Metadata
| Field | Value |
|-------|-------|
| GitHub # | $github_num |
| State | $state |
| Author | @$author |
| Created | $created |
| Merged | $merged |
---
## Description
$body
---
## Reviews & Comments
PR_EOF
jq -r '.comments[]? | "### Comment by @\(.author.login)\n\n\(.body)\n\n---\n"' "$json_file" >> "$output_file" 2>/dev/null || true
jq -r '.reviews[]? | "### Review by @\(.author.login) [\(.state)]\n\n\(.body // "No comment")\n\n---\n"' "$json_file" >> "$output_file" 2>/dev/null || true
}
# Generate Issue index
generate_issue_index() {
local dir="$1"
cat > "$dir/INDEX.md" << 'INDEX_HEADER'
# Issues Index
## Reception Score Legend
| Score | Meaning | Action |
|-------|---------|--------|
| ADDRESSED | Closed after discussion | Review if actually fixed |
| DISMISSED | Labeled wontfix/invalid | **RECLAIM candidate** |
| IGNORED | Closed, no response | **RECLAIM candidate** |
| STALE | Open, no replies | Needs attention |
| ACTIVE | Open with discussion | In progress |
---
## Issues
| Seq | GitHub # | Title | Score |
|-----|----------|-------|-------|
INDEX_HEADER
for file in "$dir"/[0-9]*.md; do
[ -f "$file" ] || continue
local seq=$(basename "$file" .md)
local github_num=$(sed -n 's/^# Issue #\([0-9]*\):.*/\1/p' "$file")
local title=$(head -1 "$file" | sed 's/^# Issue #[0-9]*: //')
local score=$(sed -n '/\*\*[A-Z]/s/.*\*\*\([A-Z]*\)\*\*.*/\1/p' "$file" | head -1)
echo "| [$seq]($seq.md) | #$github_num | $title | $score |" >> "$dir/INDEX.md"
done
echo " Created Issue/INDEX.md"
}
# Generate PR index
generate_pr_index() {
local dir="$1"
cat > "$dir/INDEX.md" << 'INDEX_HEADER'
# Pull Requests Index
## Reception Score Legend
| Score | Meaning | Action |
|-------|---------|--------|
| MERGED | PR accepted | Done |
| REJECTED | PR closed unmerged | Review why |
| PENDING | PR still open | Needs review |
---
## Pull Requests
| Seq | GitHub # | Title | Score |
|-----|----------|-------|-------|
INDEX_HEADER
for file in "$dir"/[0-9]*.md; do
[ -f "$file" ] || continue
local seq=$(basename "$file" .md)
local github_num=$(sed -n 's/^# PR #\([0-9]*\):.*/\1/p' "$file")
local title=$(head -1 "$file" | sed 's/^# PR #[0-9]*: //')
local score=$(sed -n '/\*\*[A-Z]/s/.*\*\*\([A-Z]*\)\*\*.*/\1/p' "$file" | head -1)
echo "| [$seq]($seq.md) | #$github_num | $title | $score |" >> "$dir/INDEX.md"
done
echo " Created PR/INDEX.md"
}
# Show rate limit status
show_rate_status() {
local rate_json=$(gh api rate_limit 2>/dev/null)
if [ -z "$rate_json" ]; then
echo "Failed to fetch rate limit"
exit 1
fi
echo "=== GitHub API Rate Limit Status ==="
echo ""
echo "Core (REST API):"
echo " Remaining: $(echo "$rate_json" | jq -r '.resources.core.remaining') / $(echo "$rate_json" | jq -r '.resources.core.limit')"
local core_reset=$(echo "$rate_json" | jq -r '.resources.core.reset')
echo " Reset: $(date -d "@$core_reset" '+%H:%M:%S' 2>/dev/null || date -r "$core_reset" '+%H:%M:%S' 2>/dev/null || echo "$core_reset")"
echo ""
echo "GraphQL:"
echo " Remaining: $(echo "$rate_json" | jq -r '.resources.graphql.remaining') / $(echo "$rate_json" | jq -r '.resources.graphql.limit')"
local gql_reset=$(echo "$rate_json" | jq -r '.resources.graphql.reset')
echo " Reset: $(date -d "@$gql_reset" '+%H:%M:%S' 2>/dev/null || date -r "$gql_reset" '+%H:%M:%S' 2>/dev/null || echo "$gql_reset")"
echo ""
echo "Search:"
echo " Remaining: $(echo "$rate_json" | jq -r '.resources.search.remaining') / $(echo "$rate_json" | jq -r '.resources.search.limit')"
echo ""
}
# Main
main() {
local targets=""
local is_org=0
SKIP_ISSUES=0
SKIP_PRS=0
# Parse args
for arg in "$@"; do
case "$arg" in
--org) is_org=1 ;;
--issues-only) SKIP_PRS=1 ;;
--prs-only) SKIP_ISSUES=1 ;;
--delay=*) DELAY="${arg#*=}" ;;
--check-rate) show_rate_status; exit 0 ;;
https://*|http://*) targets="$arg" ;;
-*) ;; # ignore unknown flags
*) targets="$arg" ;;
esac
done
if [ -z "$targets" ]; then
echo "Usage: $0 <target> [--org] [--issues-only] [--prs-only] [--delay=0.05] [--check-rate]"
echo ""
echo "Options:"
echo " --check-rate Show rate limit status (Core/GraphQL/Search) and exit"
echo " --delay=N Delay between requests (default: 0.05s)"
echo ""
echo "Rate limiting: Auto-pauses at 25% remaining (75% used) until reset+10s"
echo ""
echo "Target formats:"
echo " Single repo: LetheanNetwork/lthn-app-vpn"
echo " Single org: LetheanNetwork --org"
echo " Batch orgs: \"LetheanNetwork,graft-project,oxen-io\" --org"
echo " Batch repos: \"owner/repo1,owner/repo2\""
echo ""
echo "Output: repo/{org}/{repo}/Issue/ repo/{org}/{repo}/PR/"
echo ""
echo "Full registry list (copy-paste ready):"
echo ""
echo " # Lethean ecosystem"
echo " $0 \"LetheanNetwork,letheanVPN,LetheanMovement\" --org"
echo ""
echo " # CryptoNote projects"
echo " $0 \"monero-project,haven-protocol-org,hyle-team,zanoio\" --org"
echo " $0 \"kevacoin-project,scala-network,deroproject\" --org"
echo " $0 \"Karbovanets,wownero,turtlecoin\" --org"
echo " $0 \"masari-project,aeonix,nerva-project\" --org"
echo " $0 \"ConcealNetwork,ryo-currency,sumoprojects\" --org"
echo " $0 \"bcndev,electroneum\" --org"
echo ""
echo " # Dead/salvage priority"
echo " $0 \"graft-project,graft-community,oxen-io,loki-project\" --org"
echo ""
echo " # Non-CN reference projects"
echo " $0 \"theQRL,hyperswarm,holepunchto,openhive-network,octa-space\" --org"
exit 1
fi
# Handle comma-separated list
IFS=',' read -ra TARGET_LIST <<< "$targets"
for target in "${TARGET_LIST[@]}"; do
# Trim whitespace
target=$(echo "$target" | xargs)
local parsed=$(parse_github_url "$target")
if [ "$is_org" = "1" ]; then
collect_org "$parsed"
else
collect_repo "$parsed"
fi
done
echo ""
echo "=== Collection Complete ==="
echo "Output: $OUTPUT_BASE/"
}
main "$@"

View file

@ -0,0 +1,57 @@
# Job-Based Collector
Two-phase collection pattern: generate jobs, then process downloaded files.
## Workflow
```
1. Generate jobs → jobs.txt (list of URLs)
2. Feed to proxy → (your infrastructure)
3. Process results → markdown output
```
## Usage
### Phase 1: Generate Job List
```bash
# BitcoinTalk thread
./generate-jobs.sh bitcointalk 2769739 > jobs.txt
# Reddit thread/subreddit
./generate-jobs.sh reddit "r/lethean" --limit=100 > jobs.txt
# Wayback Machine snapshots
./generate-jobs.sh wayback "lethean.io" > jobs.txt
# Medium author/publication
./generate-jobs.sh medium "@lethean" > jobs.txt
```
### Phase 2: Process Downloaded Files
```bash
# After proxy fetches all URLs to ./downloads/
./process.sh bitcointalk ./downloads/ --output=./archive/
```
## Job File Format
```
# jobs.txt
URL|OUTPUT_FILENAME|TYPE|METADATA
https://bitcointalk.org/index.php?topic=2769739.0|btt-2769739-p0.html|bitcointalk|page=0
https://bitcointalk.org/index.php?topic=2769739.20|btt-2769739-p20.html|bitcointalk|page=20
```
## Supported Sources
| Source | Job Generator | Processor |
|--------|---------------|-----------|
| bitcointalk | ✅ | ✅ |
| reddit | ✅ | ✅ |
| wayback | ✅ | ✅ |
| medium | ✅ | ✅ |
| twitter | 🔜 | 🔜 |
| discord-export | N/A | ✅ (local files) |
| telegram-export | N/A | ✅ (local files) |

View file

@ -0,0 +1,107 @@
#!/usr/bin/env bash
# Generate job list for proxy-based collection
# Usage: ./generate-jobs.sh <source> <target> [options] > jobs.txt
set -e
SOURCE="$1"
TARGET="$2"
shift 2 || true
# Defaults
LIMIT=1000
PAGES=100
# Parse options
for arg in "$@"; do
case "$arg" in
--limit=*) LIMIT="${arg#*=}" ;;
--pages=*) PAGES="${arg#*=}" ;;
esac
done
# Output header
echo "# Job list generated $(date +%Y-%m-%d\ %H:%M)"
echo "# Source: $SOURCE | Target: $TARGET"
echo "# Format: URL|FILENAME|TYPE|METADATA"
echo "#"
case "$SOURCE" in
bitcointalk|btt)
# Extract topic ID
TOPIC_ID=$(echo "$TARGET" | grep -oE '[0-9]+' | head -1)
echo "# BitcoinTalk topic: $TOPIC_ID"
echo "#"
# Generate page URLs (20 posts per page)
for ((i=0; i<PAGES*20; i+=20)); do
echo "https://bitcointalk.org/index.php?topic=${TOPIC_ID}.${i}|btt-${TOPIC_ID}-p${i}.html|bitcointalk|page=$((i/20)),offset=$i"
done
;;
reddit)
# Handle r/subreddit or full URL
SUBREDDIT=$(echo "$TARGET" | sed 's|.*/r/||' | sed 's|/.*||')
echo "# Reddit: r/$SUBREDDIT"
echo "#"
# Subreddit pages (top, new, hot)
for sort in "top" "new" "hot"; do
echo "https://old.reddit.com/r/${SUBREDDIT}/${sort}/.json?limit=100|reddit-${SUBREDDIT}-${sort}.json|reddit|sort=$sort"
done
# If it's a specific thread
if [[ "$TARGET" =~ comments/([a-z0-9]+) ]]; then
THREAD_ID="${BASH_REMATCH[1]}"
echo "https://old.reddit.com/r/${SUBREDDIT}/comments/${THREAD_ID}.json|reddit-thread-${THREAD_ID}.json|reddit|thread=$THREAD_ID"
fi
;;
wayback|archive)
# Clean domain
DOMAIN=$(echo "$TARGET" | sed 's|https\?://||' | sed 's|/.*||')
echo "# Wayback Machine: $DOMAIN"
echo "#"
# CDX API to get all snapshots
echo "https://web.archive.org/cdx/search/cdx?url=${DOMAIN}/*&output=json&limit=${LIMIT}|wayback-${DOMAIN}-cdx.json|wayback-index|domain=$DOMAIN"
# Common important pages
for path in "" "index.html" "about" "roadmap" "team" "whitepaper" "faq"; do
echo "https://web.archive.org/web/2020/${DOMAIN}/${path}|wayback-${DOMAIN}-2020-${path:-index}.html|wayback|year=2020,path=$path"
echo "https://web.archive.org/web/2021/${DOMAIN}/${path}|wayback-${DOMAIN}-2021-${path:-index}.html|wayback|year=2021,path=$path"
echo "https://web.archive.org/web/2022/${DOMAIN}/${path}|wayback-${DOMAIN}-2022-${path:-index}.html|wayback|year=2022,path=$path"
done
;;
medium)
# Handle @author or publication
AUTHOR=$(echo "$TARGET" | sed 's|.*/||' | sed 's|^@||')
echo "# Medium: @$AUTHOR"
echo "#"
# Medium RSS feed (easier to parse)
echo "https://medium.com/feed/@${AUTHOR}|medium-${AUTHOR}-feed.xml|medium-rss|author=$AUTHOR"
# Profile page
echo "https://medium.com/@${AUTHOR}|medium-${AUTHOR}-profile.html|medium|author=$AUTHOR"
;;
twitter|x)
USERNAME=$(echo "$TARGET" | sed 's|.*/||' | sed 's|^@||')
echo "# Twitter/X: @$USERNAME"
echo "# Note: Twitter requires auth - use nitter or API"
echo "#"
# Nitter instances (public, no auth)
echo "https://nitter.net/${USERNAME}|twitter-${USERNAME}.html|nitter|user=$USERNAME"
echo "https://nitter.net/${USERNAME}/with_replies|twitter-${USERNAME}-replies.html|nitter|user=$USERNAME,type=replies"
;;
*)
echo "# ERROR: Unknown source '$SOURCE'" >&2
echo "# Supported: bitcointalk, reddit, wayback, medium, twitter" >&2
exit 1
;;
esac

View file

@ -0,0 +1,242 @@
#!/usr/bin/env bash
# Process downloaded files into markdown
# Usage: ./process.sh <source> <downloads-dir> [--output=DIR]
set -e
SOURCE="$1"
DOWNLOADS="$2"
shift 2 || true
OUTPUT="./processed"
for arg in "$@"; do
case "$arg" in
--output=*) OUTPUT="${arg#*=}" ;;
esac
done
mkdir -p "$OUTPUT/posts"
echo "=== Processing $SOURCE files from $DOWNLOADS ==="
case "$SOURCE" in
bitcointalk|btt)
echo "Processing BitcoinTalk pages..."
POST_NUM=0
for file in "$DOWNLOADS"/btt-*.html; do
[ -f "$file" ] || continue
echo " Processing: $(basename "$file")"
python3 << PYEOF
import re
import html
import os
html_content = open('$file', 'r', encoding='utf-8', errors='ignore').read()
# Extract thread title from first page
title_match = re.search(r'<title>([^<]+)</title>', html_content)
title = title_match.group(1) if title_match else "Unknown Thread"
title = title.replace(' - Bitcoin Forum', '').strip()
with open('$OUTPUT/.thread_title', 'w') as f:
f.write(title)
# Pattern for posts
post_blocks = re.findall(r'<div class="post"[^>]*id="msg(\d+)"[^>]*>(.*?)</div>\s*(?:<div class="moderatorbar"|<div class="signature">)', html_content, re.DOTALL)
for msg_id, content in post_blocks:
# Clean content
content = re.sub(r'<br\s*/?>', '\n', content)
content = re.sub(r'<[^>]+>', '', content)
content = html.unescape(content).strip()
if content:
post_num = $POST_NUM + 1
$POST_NUM = post_num
with open(f'$OUTPUT/posts/POST-{post_num:04d}.md', 'w') as f:
f.write(f"# Post #{post_num}\\n\\n")
f.write(f"Message ID: {msg_id}\\n\\n")
f.write(f"---\\n\\n")
f.write(content)
f.write("\\n")
print(f" POST-{post_num:04d}.md")
print(f"TOTAL:{$POST_NUM}")
PYEOF
done
# Generate index
TITLE=$(cat "$OUTPUT/.thread_title" 2>/dev/null || echo "BitcoinTalk Thread")
TOTAL=$(ls "$OUTPUT/posts/"POST-*.md 2>/dev/null | wc -l)
cat > "$OUTPUT/INDEX.md" << EOF
# $TITLE
Archived from BitcoinTalk
| Posts | $(echo $TOTAL) |
|-------|------|
## Posts
EOF
for f in "$OUTPUT/posts/"POST-*.md; do
[ -f "$f" ] || continue
NUM=$(basename "$f" .md | sed 's/POST-0*//')
echo "- [Post #$NUM](posts/$(basename $f))" >> "$OUTPUT/INDEX.md"
done
;;
reddit)
echo "Processing Reddit JSON..."
for file in "$DOWNLOADS"/reddit-*.json; do
[ -f "$file" ] || continue
echo " Processing: $(basename "$file")"
python3 << PYEOF
import json
import os
data = json.load(open('$file', 'r'))
# Handle different Reddit JSON structures
posts = []
if isinstance(data, list) and len(data) > 0:
if 'data' in data[0]:
# Thread format
posts = data[0]['data']['children']
else:
posts = data
elif isinstance(data, dict) and 'data' in data:
posts = data['data']['children']
for i, post_wrapper in enumerate(posts):
post = post_wrapper.get('data', post_wrapper)
title = post.get('title', post.get('body', '')[:50])
author = post.get('author', 'unknown')
score = post.get('score', 0)
body = post.get('selftext', post.get('body', ''))
created = post.get('created_utc', 0)
filename = f'$OUTPUT/posts/REDDIT-{i+1:04d}.md'
with open(filename, 'w') as f:
f.write(f"# {title}\\n\\n")
f.write(f"| Author | u/{author} |\\n")
f.write(f"|--------|----------|\\n")
f.write(f"| Score | {score} |\\n\\n")
f.write(f"---\\n\\n")
f.write(body or "(no content)")
f.write("\\n")
print(f" REDDIT-{i+1:04d}.md - {title[:40]}...")
PYEOF
done
;;
wayback)
echo "Processing Wayback Machine files..."
for file in "$DOWNLOADS"/wayback-*.html; do
[ -f "$file" ] || continue
BASENAME=$(basename "$file" .html)
echo " Processing: $BASENAME"
# Extract text content
python3 << PYEOF
import re
import html
content = open('$file', 'r', encoding='utf-8', errors='ignore').read()
# Remove scripts and styles
content = re.sub(r'<script[^>]*>.*?</script>', '', content, flags=re.DOTALL)
content = re.sub(r'<style[^>]*>.*?</style>', '', content, flags=re.DOTALL)
# Extract title
title_match = re.search(r'<title>([^<]+)</title>', content)
title = html.unescape(title_match.group(1)) if title_match else "$BASENAME"
# Get body text
body_match = re.search(r'<body[^>]*>(.*?)</body>', content, re.DOTALL)
if body_match:
body = body_match.group(1)
body = re.sub(r'<[^>]+>', ' ', body)
body = html.unescape(body)
body = re.sub(r'\s+', ' ', body).strip()
else:
body = "(could not extract body)"
with open('$OUTPUT/posts/$BASENAME.md', 'w') as f:
f.write(f"# {title}\\n\\n")
f.write(f"Source: Wayback Machine\\n\\n")
f.write(f"---\\n\\n")
f.write(body[:5000]) # Limit length
f.write("\\n")
print(f" $BASENAME.md")
PYEOF
done
;;
medium)
echo "Processing Medium files..."
# Handle RSS feed
for file in "$DOWNLOADS"/medium-*-feed.xml; do
[ -f "$file" ] || continue
echo " Processing RSS: $(basename "$file")"
python3 << PYEOF
import xml.etree.ElementTree as ET
import html
import re
tree = ET.parse('$file')
root = tree.getroot()
channel = root.find('channel')
items = channel.findall('item') if channel else root.findall('.//item')
for i, item in enumerate(items):
title = item.findtext('title', 'Untitled')
author = item.findtext('{http://purl.org/dc/elements/1.1/}creator', 'Unknown')
date = item.findtext('pubDate', '')
content = item.findtext('{http://purl.org/rss/1.0/modules/content/}encoded', '')
# Clean content
content = re.sub(r'<[^>]+>', '', content)
content = html.unescape(content)
filename = f'$OUTPUT/posts/MEDIUM-{i+1:04d}.md'
with open(filename, 'w') as f:
f.write(f"# {title}\\n\\n")
f.write(f"| Author | {author} |\\n")
f.write(f"|--------|----------|\\n")
f.write(f"| Date | {date} |\\n\\n")
f.write(f"---\\n\\n")
f.write(content[:10000])
f.write("\\n")
print(f" MEDIUM-{i+1:04d}.md - {title[:40]}...")
PYEOF
done
;;
*)
echo "ERROR: Unknown source '$SOURCE'"
echo "Supported: bitcointalk, reddit, wayback, medium"
exit 1
;;
esac
echo ""
echo "=== Processing Complete ==="
echo "Output: $OUTPUT/"

View file

@ -0,0 +1,100 @@
# Ledger Papers Archive
Comprehensive collection of distributed ledger, cryptographic protocol, and decentralized systems whitepapers.
**For the commons - EUPL-1.2 CIC**
## Stats
- **91+ papers** across **15 categories**
- Genesis to modern (1998-2024)
- Academic + project whitepapers
## Categories
| Category | Papers | Description |
|----------|--------|-------------|
| genesis | 4 | Pre-Bitcoin: b-money, hashcash, bit gold |
| cryptonote | 2 | CryptoNote v2.0 + standards (CNS001-010) |
| mrl | 11 | Monero Research Lab (MRL-0001 to MRL-0011) |
| privacy | 9 | Zcash, Dash, Mimblewimble, Lelantus, Spark |
| smart-contracts | 10 | Ethereum, Solana, Cardano, Polkadot, etc |
| layer2 | 7 | Lightning, Plasma, Rollups, zkSync |
| consensus | 7 | PBFT, Tendermint, HotStuff, Casper |
| cryptography | 10 | Bulletproofs, CLSAG, PLONK, Schnorr, BLS |
| defi | 7 | Uniswap, Aave, Compound, Curve, MakerDAO |
| storage | 5 | IPFS, Filecoin, Arweave, Sia |
| identity | 3 | DIDs, Verifiable Credentials, Semaphore |
| cryptonote-projects | 5 | Haven, Masari, TurtleCoin, Wownero, DERO |
| attacks | 5 | Selfish mining, eclipse, traceability |
| oracles | 3 | Chainlink, Band Protocol |
| bridges | 3 | Atomic swaps, XCLAIM, THORChain |
## Usage
```bash
# All papers (91+)
./discover.sh --all > jobs.txt
# By category
./discover.sh --category=cryptography > jobs.txt
./discover.sh --category=defi > jobs.txt
# By topic
./discover.sh --topic=bulletproofs > jobs.txt
./discover.sh --topic=zk-snarks > jobs.txt
# IACR search for more
./discover.sh --search-iacr > search-jobs.txt
# List categories
./discover.sh --help
```
## Output Format
```
URL|FILENAME|TYPE|METADATA
https://bitcoin.org/bitcoin.pdf|bitcoin.pdf|paper|category=genesis,title=Bitcoin...
```
## CDN Hosting Structure
```
papers.lethean.io/
├── genesis/
│ ├── bitcoin.pdf
│ ├── b-money.txt
│ └── hashcash.pdf
├── cryptonote/
│ ├── cryptonote-v2.pdf
│ └── cns/
│ ├── cns001.txt
│ └── ...
├── mrl/
│ ├── MRL-0001.pdf
│ └── ...
├── cryptography/
│ ├── bulletproofs.pdf
│ ├── clsag.pdf
│ └── ...
└── INDEX.json
```
## Adding Papers
Edit `registry.json`:
```json
{
"id": "paper-id",
"title": "Paper Title",
"year": 2024,
"url": "https://example.com/paper.pdf",
"topics": ["topic1", "topic2"]
}
```
## License Note
Papers collected for archival/educational purposes. Original copyrights remain with authors. CDN hosting as community service under CIC principles.

View file

@ -0,0 +1,10 @@
# 00-genesis
The papers that started it all (1998-2008)
| Paper | Author | Year |
|-------|--------|------|
| b-money.txt | Wei Dai | 1998 |
| hashcash.pdf | Adam Back | 2002 |
| bit-gold.html | Nick Szabo | 2005 |
| bitcoin.pdf | Satoshi Nakamoto | 2008 |

View file

@ -0,0 +1,8 @@
# 01-cryptonote
CryptoNote protocol foundation
| Paper | Notes |
|-------|-------|
| cryptonote-v2.pdf | Ring signatures, stealth addresses |
| cns/ | CNS001-CNS010 standards |

View file

@ -0,0 +1,17 @@
# 02-mrl
Monero Research Lab publications
| Paper | Topic |
|-------|-------|
| MRL-0001.pdf | Chain reaction traceability |
| MRL-0002.pdf | Merkle tree exploits |
| MRL-0003.pdf | Monero overview |
| MRL-0004.pdf | Obfuscation improvements |
| MRL-0005.pdf | RingCT |
| MRL-0006.pdf | Subaddresses |
| MRL-0007.pdf | Spent outputs |
| MRL-0008.pdf | Dual linkable ring sigs |
| MRL-0009.pdf | Thring signatures |
| MRL-0010.pdf | Triptych |
| MRL-0011.pdf | Triptych-2 |

View file

@ -0,0 +1,15 @@
# 03-privacy
Confidentiality-focused protocols
| Paper | Protocol |
|-------|----------|
| zerocoin.pdf | Zero-knowledge mixing |
| zerocash.pdf | zk-SNARKs shielded |
| zcash-protocol.pdf | Sapling, Orchard |
| dash.pdf | Masternodes, PrivateSend |
| mimblewimble.txt | Cut-through, no addresses |
| grin.md | Mimblewimble impl |
| beam.md | Lelantus-MW |
| lelantus.pdf | One-out-of-many proofs |
| spark.pdf | Lelantus v2 |

View file

@ -0,0 +1,16 @@
# 04-smart-contracts
Programmable ledger platforms
| Paper | Platform |
|-------|----------|
| ethereum.pdf | EVM, gas model |
| ethereum-yellowpaper.pdf | Formal spec |
| solana.pdf | Proof of History |
| cardano-ouroboros.pdf | PoS consensus |
| polkadot.pdf | Parachains, relay |
| cosmos.pdf | Tendermint, IBC |
| avalanche.pdf | Snowball consensus |
| near.pdf | Nightshade sharding |
| tezos.pdf | Self-amending |
| algorand.pdf | Pure PoS, VRF |

View file

@ -0,0 +1,13 @@
# 05-layer2
Scaling & off-chain solutions
| Paper | Type |
|-------|------|
| lightning.pdf | Payment channels |
| plasma.pdf | Child chains |
| rollups.html | Optimistic + ZK |
| starkware.pdf | STARKs |
| zksync.md | ZK rollup |
| optimism.md | Optimistic rollup |
| arbitrum.pdf | Interactive fraud |

View file

@ -0,0 +1,13 @@
# 06-consensus
Consensus algorithm research
| Paper | Algorithm |
|-------|-----------|
| pbft.pdf | Classic BFT (1999) |
| tendermint.pdf | BFT + PoS |
| hotstuff.pdf | Linear BFT |
| casper.pdf | Finality gadget |
| gasper.pdf | GHOST + Casper |
| raft.pdf | CFT leader election |
| nakamoto-analysis.pdf | PoW analysis |

View file

@ -0,0 +1,16 @@
# 07-cryptography
Cryptographic foundations
| Paper | Primitive |
|-------|-----------|
| bulletproofs.pdf | Range proofs |
| bulletproofs-plus.pdf | Improved range |
| clsag.pdf | Linkable ring sigs |
| triptych.pdf | Log-sized rings |
| seraphis.pdf | Next-gen Monero |
| plonk.pdf | Universal SNARKs |
| groth16.pdf | Succinct SNARKs |
| schnorr.pdf | Signatures |
| bls.pdf | Aggregated sigs |
| pedersen.pdf | Commitments |

View file

@ -0,0 +1,13 @@
# 08-defi
Decentralized finance protocols
| Paper | Protocol |
|-------|----------|
| uniswap-v2.pdf | AMM |
| uniswap-v3.pdf | Concentrated liquidity |
| compound.pdf | Lending, cTokens |
| aave.pdf | Flash loans |
| makerdao.pdf | DAI stablecoin |
| curve.pdf | StableSwap |
| balancer.pdf | Weighted pools |

View file

@ -0,0 +1,11 @@
# 09-storage
Decentralized storage networks
| Paper | Network |
|-------|---------|
| ipfs.pdf | Content addressing |
| filecoin.pdf | Proof of storage |
| arweave.pdf | Permanent storage |
| sia.pdf | File contracts |
| storj.pdf | Erasure coding |

View file

@ -0,0 +1,9 @@
# 10-identity
Decentralized identity
| Paper | Standard |
|-------|----------|
| did-spec.html | W3C DIDs |
| verifiable-credentials.html | W3C VCs |
| semaphore.md | ZK signaling |

View file

@ -0,0 +1,11 @@
# 11-dag
DAG-based ledger structures
| Paper | Structure |
|-------|-----------|
| iota-tangle.pdf | Tangle, feeless |
| nano.pdf | Block lattice |
| fantom-lachesis.pdf | aBFT DAG |
| hedera-hashgraph.pdf | Gossip DAG |
| avalanche-snowflake.pdf | Metastable |

View file

@ -0,0 +1,11 @@
# 12-mev
Maximal Extractable Value research
| Paper | Topic |
|-------|-------|
| flashboys-2.pdf | DEX frontrunning |
| flashbots-protect.md | MEV protection |
| mev-boost.md | PBS architecture |
| order-fairness.pdf | Fair ordering |
| clockwork-finance.pdf | Economic security |

View file

@ -0,0 +1,13 @@
# 13-standards-btc
Bitcoin Improvement Proposals (BIPs)
| BIP | Topic |
|-----|-------|
| BIP-0001 | Process |
| BIP-0032 | HD Wallets |
| BIP-0039 | Seed phrases |
| BIP-0141 | SegWit |
| BIP-0340 | Schnorr |
| BIP-0341 | Taproot |
| BIP-0174 | PSBT |

View file

@ -0,0 +1,13 @@
# 14-standards-eth
Ethereum Improvement Proposals (EIPs/ERCs)
| EIP/ERC | Topic |
|---------|-------|
| EIP-1 | Process |
| ERC-20 | Fungible tokens |
| ERC-721 | NFTs |
| ERC-1155 | Multi-token |
| EIP-1559 | Fee market |
| EIP-4844 | Proto-danksharding |
| ERC-4337 | Account abstraction |

View file

@ -0,0 +1,11 @@
# 15-p2p
Peer-to-peer networking
| Paper | Protocol |
|-------|----------|
| libp2p.md | Modular p2p |
| kademlia.pdf | DHT routing |
| gossipsub.md | Pub/sub |
| dandelion.pdf | TX anonymity |
| dandelion-pp.pdf | Improved |

View file

@ -0,0 +1,12 @@
# 16-zk-advanced
Next-generation ZK systems
| Paper | System |
|-------|--------|
| halo.pdf | No trusted setup |
| halo2.md | Plonkish |
| nova.pdf | Folding schemes |
| supernova.pdf | Universal folding |
| plonky2.pdf | FRI + PLONK |
| stark.pdf | Post-quantum |

View file

@ -0,0 +1,9 @@
# 17-oracles
Decentralized oracle networks
| Paper | Network |
|-------|---------|
| chainlink.pdf | Data feeds |
| chainlink-2.pdf | OCR, CCIP |
| band-protocol.pdf | Cosmos oracle |

View file

@ -0,0 +1,9 @@
# 18-bridges
Cross-chain interoperability
| Paper | Method |
|-------|--------|
| atomic-swaps.pdf | HTLC |
| xclaim.pdf | Trustless wrapped |
| thorchain.pdf | Native swaps |

View file

@ -0,0 +1,11 @@
# 19-attacks
Security research
| Paper | Attack |
|-------|--------|
| selfish-mining.pdf | Mining strategy |
| eclipse-attack.pdf | P2P isolation |
| monero-traceability.pdf | Ring analysis |
| flashboys-2.pdf | DEX frontrun |
| 51-attack.pdf | Double spend |

View file

@ -0,0 +1,11 @@
# 20-cryptonote-projects
CryptoNote ecosystem extensions
| Paper | Project |
|-------|---------|
| haven-xassets.pdf | Confidential assets |
| masari-secor.pdf | Uncle mining |
| turtle-karai.md | Sidechains |
| wownero-randomwow.md | CPU PoW |
| dero-stargate.md | Homomorphic |

View file

@ -0,0 +1,46 @@
# GraftNetwork Technical Documents
**Status:** Dead (2020)
**Salvage Priority:** HIGH
**Source:** github.com/graft-project/graft-ng
GraftNetwork was a CryptoNote-based payment network with supernode architecture for real-time authorization (RTA). The project died during crypto winter but left excellent technical documentation.
## Documents
| File | Original | Description |
|------|----------|-------------|
| RFC-001-GSD-general-supernode-design.md | Issue #187 | Supernode architecture, announce mechanism, key management |
| RFC-002-SLS-supernode-list-selection.md | Issue #185 | Auth sample selection algorithm |
| RFC-003-RTVF-rta-transaction-validation.md | Issue #191 | RTA validation flow + jagerman's security critique |
| auth-sample-selection-algorithm.md | Issue #182 | Randomness + stake weighting for sample selection |
| udht-implementation.md | Issue #341 | Unstructured DHT for supernode discovery |
| rta-double-spend-attack-vectors.md | Issue #425 | Attack matrix and solutions |
| RFC-005-DF-disqualification-flow.md | DesignDocs #2 | Disqualification scoring + jagerman critique |
| communication-options-p2p-design.md | DesignDocs #1 | 5 P2P architecture options with tradeoffs |
| blockchain-based-list-selection-analysis.md | GraftNetwork PR-225 | jagerman's 10M simulation statistical analysis |
## Key Insights
### From RFC 001 (jagerman's critique)
- Announce mechanism creates 60-144 GB/day network traffic
- Hop count in announcements leaks IP (not anonymous)
- Suggested fix: disqualification tx on-chain instead of gossip
### From RFC 003 (privacy analysis)
- Proxy SN sees: recipient wallet, amount, item list
- Auth sample sees: total amount
- Single point of failure in proxy design
- Solution: end-to-end encryption, zero-knowledge proofs
### From Attack Vectors
- RTA vs non-RTA: prioritize RTA, rollback conflicting blocks
- RTA vs RTA: shouldn't happen if auth sample honest
- Needs checkpoint depth limit
## Relevance to Lethean
- Service node architecture → Exit node incentives
- RTA validation → Session authorization
- Disqualification flow → Node quality enforcement
- UDHT → Decentralized service discovery

View file

@ -0,0 +1,233 @@
# Issue #187: [RFC 001 GSD] General Supernode Design
## Reception Score
| Score | Reason |
|-------|--------|
| **ACTIVE** | Open with discussion |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @jagerman |
| Created | 2018-12-27 |
| Closed | N/A |
| Labels | RFC-draft |
| Comments | 4 |
---
## Original Post
**Author:** @jagerman
Some comments:
> The supernode charges the clients an optional fee for this activity.
Optional?
> Upon start, each supernode should be given a public wallet address that is used to collect service fees and may be a receiver of a stake transaction.
What is the point of this? That receiving wallet is already included in the registration transaction on the blockchain; I don't see why the supernode needs to have a wallet (even just the wallet address) manually configured at all rather than just picking it up from the registration transaction.
> The supernode must regenerate the key pair per each stake renewal.
This is, as I have mentioned before, a very odd requirement. It adds some (small) extra work on the part of the operator, and it would seem to make it impossible to verify when a SN is being renewed rather than newly registered (and thus not double-counted if it is both renewed and in the "overhang" period). It also means that as soon as a SN stake is renewed (thus changing the key) any RTA requests that still use the old key simply won't be received by the SN in question. In theory, you could make the SN keep both keys, but this raises the obvious question of: Why bother? In #176 you wrote:
> You asked why we did not declare permanent supernode identification keypair. The main reason was that we didn't see any reason to make it permanent. The temporal keypair is enough for our goals and regeneration of this key won't create large overwork during stake renewal. And yes, the lifespan of this key pair will be equal to the stake period and during stake renewal supernode owner also need to update it. If someone wants to build a tracking system, they can do it anyway.
I carefully counted the number of benefits of mandatory regeneration provided in this description: 0. So it has zero benefits and more than zero drawbacks. So why is it here?
> Not storing any wallet related private information on supernode is a more secure approach, but it doesn't allow automatic re-staking.
Why not? Other coins are able to implement automatic renewal without requiring a password-unprotected wallet or having the wallet on a service node; what part of the Graft design prevents Graft from doing what other coins have done?
> Stake transaction must include the following data:
> - the receiver of this transaction must be supernode's public wallet address;
> ...
> - tx_extra must contain supernode public wallet address;
This is a minor point, but it isn't entirely clear why this is required: you could simply include both a recipient wallet address and a reward recipient wallet to allow the possibility of wallet A to submit a stake with rewards going to wallet B, which seems like it could be useful.
> TRP determines the number of blocks during which supernode is allowed to participate in RTA validation even if it has no locked stake. If during TRP supernode owner doesn't renew its stake transaction, the supernode will be removed from active supernode list and will not be able to participate in RTA validation.
And how, exactly, will you determine that the SN has been renewed since it won't have the old stake's pubkey anymore?
> The mechanism of periodic announcements has, therefore, a two-fold purpose:
> 1. make the best effort to deliver current status to all supernodes in the network without releasing the sender's IP to the whole network;
Verifying uptime is fine. The design, however, of including incrementing hop counts makes it almost trivial to find the IP of any SN (or, at least, the graftnoded that the SN is connected to).
> 2. build reliable communication channels between any two active supernodes in the network without releasing IPs of the participants, while producing minimal traffic overhead.
It may reduce traffic somewhat, but at the cost of a massive increase in traffic of frequent periodic traffic expenses that is almost certain to vastly eclipse any savings. A simple back-of-the-envelope calculation:
A = 2000 active service nodes (each of which a node will received an announce for)
B = 1000 bytes per announce
R = 1440 announces per day (= 1 announce per minute)
N = 50 p2p connections typical for a mainnet node
A * B * R * N = 144 GB of traffic per day both uploaded *and* downloaded just to transmit announces across the network.
And this isn't just incurred by supernodes, this is incurred by *all network nodes*. Even if you decrease the announcement rate to 1 announce every 10 minutes you are still looking at 14GB/day of announcement traffic both uploaded and downloaded *which applies to ordinary network nodes*.
This is not a design that can be considered to incurs only "minimal traffic overhead".
> RTA validation participants may use encrypted messages.
"may"?
> ## Multiple Recipients Message Encryption
This whole feature seems rather pointless. Multicast messages are going to have to be transmitted much more broadly than unicast messages: You can't just sent it along the best three paths, which you proposed for unicast messages, because each recipient is highly likely to have a completely different best three paths. It doesn't seem like this multicast approach is going to save anything compared to simply sending 8 unicast messages (and then simplifying the code by dropping multicast support if there are no remaining cases for it). There is potential for optimization here — you could use protocol pipelining to send all the unicast messages at once — the the proposed complexity added for encrypted multicast messages seems to have little benefit.
---
## Discussion Thread
### Comment by @bitkis
**Date:** 2019-01-04
> > Upon start, each supernode should be given a public wallet address that is used to collect service fees and may be a receiver of a stake transaction.
> What is the point of this? That receiving wallet is already included in the registration transaction on the blockchain; I don't see why the supernode needs to have a wallet (even just the wallet address) manually configured at all rather than just picking it up from the registration transaction.
The wallet address can be retrieved from StakeTx but the proposed approach unifies auth and proxy supernode handling.
> > The supernode must regenerate the key pair per each stake renewal.
> This is, as I have mentioned before, a very odd requirement. It adds some (small) extra work on the part of the operator, and it would seem to make it impossible to verify when a SN is being renewed rather than newly registered (and thus not double-counted if it is both renewed and in the "overhang" period). It also means that as soon as a SN stake is renewed (thus changing the key) any RTA requests that still use the old key simply won't be received by the SN in question. In theory, you could make the SN keep both keys, but this raises the obvious question of: Why bother?
Yes, we're considering both options.
> > Not storing any wallet related private information on supernode is a more secure approach, but it doesn't allow automatic re-staking.
> Why not? Other coins are able to implement automatic renewal without requiring a password-unprotected wallet or having the wallet on a service node; what part of the Graft design prevents Graft from doing what other coins have done?
Not sure what you meant here, unless you were talking about wallet side automation. What other coins have done that otherwise?
> > TRP determines the number of blocks during which supernode is allowed to participate in RTA validation even if it has no locked stake. If during TRP supernode owner doesn't renew its stake transaction, the supernode will be removed from active supernode list and will not be able to participate in RTA validation.
> And how, exactly, will you determine that the SN has been renewed since it won't have the old stake's pubkey anymore?
We don't really need to determine. If a supernode owner submits new StakeTx, the supernode starts to send announce with the new key, and old identification key just "expires".
Downtime problem during regular stake renewal can be fixed for the temporal key in the following way:
supernode, for which StakeTx unlocked, tracks it TRP, and if supernode owner renews stake transaction with a new identification key, supernode continues to send announces with the old identification key, until new StakeTx does not pass stake validation period (during this time this supernode knows both its identification keys.)
> > The mechanism of periodic announcements has, therefore, a two-fold purpose:
> > 1. make the best effort to deliver current status to all supernodes in the network without releasing the sender's IP to the whole network;
> Verifying uptime is fine. The design, however, of including incrementing hop counts makes it almost trivial to find the IP of any SN (or, at least, the graftnoded that the SN is connected to).
Well, not so trivial for hop count h > 1, there are N^h possible peers in the h-neighborhood, where N is the "typical" number you mentioned bellow.
> > 2. build reliable communication channels between any two active supernodes in the network without releasing IPs of the participants, while producing minimal traffic overhead.
> It may reduce traffic somewhat, but at the cost of a massive increase in traffic of frequent periodic traffic expenses that is almost certain to vastly eclipse any savings. A simple back-of-the-envelope calculation:
>
> A = 2000 active service nodes (each of which a node will received an announce for)
> B = 1000 bytes per announce
> R = 1440 announces per day (= 1 announce per minute)
> N = 50 p2p connections typical for a mainnet node
>
> A * B * R * N = 144 GB of traffic per day both uploaded *and* downloaded just to transmit announces across the network.
>
> And this isn't just incurred by supernodes, this is incurred by all network nodes. Even if you decrease the announcement rate to 1 announce every 10 minutes you are still looking at 14GB/day of announcement traffic both uploaded and downloaded which applies to ordinary network nodes.
Well, in our estimate, B = ~ 200 bytes. Yes, decrease of the announcement rate is one possible optimization. Another one could be separation channel construction and state update parts, emitting the state changes only when they actually happen to a 1-hop neighbor.
Dropping the announcements at whole would leave us with no uptime verification and with need to broadcast all RTA traffic. The latter would produce much higher average load to the whole network, with no optimization options.
The only alternative we see here is building yet another p2p network, now between supernodes. Still, we'd have to fight the same issues, although on a relatively smaller domain. We want to avoid this path, at least for now, and have a fully working system, with may be a somewhat suboptimal traffic flow, fist.
> This whole feature seems rather pointless. Multicast messages are going to have to be transmitted much more broadly than unicast messages: You can't just sent it along the best three paths, which you proposed for unicast messages, because each recipient is highly likely to have a completely different best three paths [...]
In our estimate, they're not so likely different.
---
### Comment by @jagerman
**Date:** 2019-01-04
> The wallet address can be retrieved from StakeTx but the proposed approach unifies auth and proxy supernode handling.
I don't understand how there is any benefit to doing this. The auth SN simply needs an address, the proxy SN needs more than just an address.
> Not sure what you meant here, unless you were talking about wallet side automation.
I was. I don't actually think that any automation that requires a hot wallet is a good idea, but if you're going to have it, it shouldn't be an unencrypted hot wallet (or, equivalently, an encrypted hot wallet with an password stored in a config file nearby) on the SN itself.
> Well, not so trivial for hop count h > 1, there are N^h possible peers in the h-neighborhood, where N is the "typical" number you mentioned bellow.
If you didn't have the hop count included in the broadcast, this would indeed be true. With with the hop count, the maximum number of nodes you would need to check to find the source is multiplicative, not exponential, because you wouldn't check the entire neighbourhood: you would only check the immediate connections and thus ignore all of those except one lowest-hop peer at each step. The worst case is thus `Nh` connections, not `N^h`, and finding the source takes at most `h` announce cycles. Someone with a bit of Monero-based coin experience could probably write code that could identify the source of any particular SN in a couple of hours.
Since this isn't actually offering SN originator IP anonymity, it isn't clear that there is any advantage at all; it would simplify a lot, greatly reduce the traffic, and not give up any secrecy if SN IP/port info could simply be public with SNs establishing direct connections.
> Downtime problem during regular stake renewal can be fixed for the temporal key in the following way: supernode, for which StakeTx unlocked, tracks it TRP, and if supernode owner renews stake transaction with a new identification key, supernode continues to send announces with the old identification key, until new StakeTx does not pass stake validation period (during this time this supernode knows both its identification keys.)
Sure, you can solve it this way, but this appears to be adding complexity in the design without any benefit at all: I'm still missing any explanation at all as to why key regeneration on renewal is an advantage.
> Well, in our estimate, B = ~ 200 bytes.
60 GB of traffic per day *just* for passing announces is still a couple of orders of magnitude too high. This isn't optional traffic, either: every network node must pass it, not just nodes with supernodes attached.
There's also the fact that this announce mechanism *directly and independently* determines the set of active SNs in such a way that this list will often be inconsistent across nodes, as I have commented on in #185 .
The answer to *both* problems is to provide a strong incentive for SN operators to ensure that they stay online, and to unify online/offline information across the network. You do the first one (incentive) by penalizing a node that misses performance targets. You do the second one (unified information) by storing the information on active/inactive nodes in the blockchain.
So, for example, you could set a disqualification trigger at: haven't transmitted an hourly ping in >2 hours or have missed responding to >4 RTA requests. If you hit either trigger, you get disqualified for 10 days (7200 blocks). Then every period, a quorum of nodes would check a random subset of active supernodes for disqualification failures, and if a majority votes for disqualificiation, a disqualification tx would be submitted to the mempool. As soon as that tx gets mined into the chain, all nodes immediately know the node is disqualified. The SN list is the same everywhere, there's a strong incentive to ensure a reliable connection, pings can be done only hourly incurring minimal announce traffic, and you have total active SN consistency, thus allowing RTA auth sample verification.
---
### Comment by @bitkis
**Date:** 2019-01-07
> > Not sure what you meant here, unless you were talking about wallet side automation.
> I was. I don't actually think that any automation that requires a hot wallet is a good idea, but if you're going to have it, it shouldn't be an unencrypted hot wallet (or, equivalently, an encrypted hot wallet with an password stored in a config file nearby) on the SN itself.
Agree. And we actually went away from that.
> > Well, not so trivial for hop count h > 1, there are N^h possible peers in the h-neighborhood, where N is the "typical" number you mentioned bellow.
> If you didn't have the hop count included in the broadcast, this would indeed be true. With with the hop count, the maximum number of nodes you would need to check to find the source is multiplicative, not exponential, because you wouldn't check the entire neighborhood: you would only check the immediate connections and thus ignore all of those except one lowest-hop peer at each step. The worst case is thus Nh connections, not N^h, and finding the source takes at most h announce cycles.
Sorry I don't see it this way. We might be off by 1 (depending how you count, it can be `N^{h-1}`) but it's still exponential: you can check the immediate connections and ignore all of them except one lowest-hop peer _at the first step only_. You can't continue doing that unless you own the whole h-neighborhood :)
No RPC API should/will provide the neighbor-hop map. And the IP anonymity is actually there.
> > Well, in our estimate, B = ~ 200 bytes.
> 60 GB of traffic per day just for passing announces is still a couple of orders of magnitude too high. This isn't optional traffic, either: every network node must pass it, not just nodes with supernodes attached.
We do believe the traffic can be significantly reduced. Anyway, the point is taken.
> So, for example, you could set a disqualification trigger at: haven't transmitted an hourly ping in >2 hours or have missed responding to >4 RTA requests. If you hit either trigger, you get disqualified for 10 days (7200 blocks). Then every period, a quorum of nodes would check a random subset of active supernodes for disqualification failures, and if a majority votes for disqualification, a disqualification tx would be submitted to the mempool. As soon as that tx gets mined into the chain, all nodes immediately know the node is disqualified. The SN list is the same everywhere, there's a strong incentive to ensure a reliable connection, pings can be done only hourly incurring minimal announce traffic, and you have total active SN consistency, thus allowing RTA auth sample verification.
Great idea, actually. We are looking at penalization right now, and the idea of the disqualification tx may be exactly the right one.
On the other hand I doubt the mechanism based on disqualification tx can be a primary guard in case of RTA: it's naturally slow. Yes, it lets us to punish a "bad" node but it doesn't help us to ensure _real time_ authorization on a short run. To me, we need both to penalize nodes that miss performance targets, _and_ to minimize possibility of RTA failure.
---
### Comment by @jagerman
**Date:** 2019-01-07
>> If you didn't have the hop count included in the broadcast, this would indeed be true. With with the hop count, the maximum number of nodes you would need to check to find the source is multiplicative, not exponential, because you wouldn't check the entire neighborhood: you would only check the immediate connections and thus ignore all of those except one lowest-hop peer at each step. The worst case is thus Nh connections, not N^h, and finding the source takes at most h announce cycles.
> Sorry I don't see it this way. We might be off by 1 (depending how you count, it can be N^{h-1}) but it's still exponential: you can check the immediate connections and ignore all of them except one lowest-hop peer at the first step only. You can't continue doing that unless you own the whole h-neighborhood :)
No RPC API should/will provide the neighbor-hop map. And the IP anonymity is actually there.
A remote node's peer list is literally the second thing exchanged (after the network id) when one node connects to a peer; this is a pretty fundamental part of the p2p communication layer. So you can get the lowest-hop peer of your current peer list (call it A), close all your peer connections and open new connections to all A's recent peers. Repeat `h` times; you'll now have the source node.
---

View file

@ -0,0 +1,126 @@
# Issue #185: [RFC-002-SLS]-Supernode-List-Selection
## Reception Score
| Score | Reason |
|-------|--------|
| **ACTIVE** | Open with discussion |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @jagerman |
| Created | 2018-12-27 |
| Closed | N/A |
| Labels | RFC-draft |
| Comments | 4 |
---
## Original Post
**Author:** @jagerman
> This algorithm has the following advantages:
It actually doesn't appear to have any of the listed advantages:
> 1. Consistency, since it based on consistent Blockchain-based List
False. Consistency in a decentralized network means that all properly performing network nodes agree on an answer. The blockchain-based list is indeed consistent, but the sample selection doesn't only depend on that; it *also* depends on the announce-based list, and the announce system can easily differ across individual nodes. Network latency, local system clock differences, node restarts, and momentary connection losses can all contribute to such inconsistencies. Thus the algorithm is *not* consistent across the network. You even stated as much earlier:
> On this level, the [announce-based] list isn't completely consistent over the network but our chance that selected supernodes are online at that moment of time is high.
It is completely irrelevant if it is "high" because if it isn't 100% you cannot reject RTA transactions that used the wrong supernodes, and if you can't do that then you allow proxy SN operators to cheat the system by altering their proxy SN to use their own 8 RTA SNs all the time (and thus capture all of the fees of every transaction through that proxy SN).
> 4. There is a good chance two sequential sets of Auth Sample participants overlap, and hence, RTA validation becomes even more consistent.
Something either is or is not consistent. If random chance makes something "even more consistent" then it is not consistent. See point 1.
> 2. Auth Sample is unique for each payment since it depends from payment id.
This has the same cheating potential as having an inconsistent list: even if the list itself *wasn't* inconsistent, this opens up another exploit: I could simply craft a payment ID (rather than using a fully random ID) designed to choose as many of my own SNs as possible.
I'm also concerned here by the use of payment IDs: if this is a payment ID included in the transaction then it is relying on a feature that is already deprecated by Monero and on the way out (even in its encrypted form) in favour of using vastly superior one-time subaddresses. But perhaps you just mean an internal payment ID rather than a transaction payment ID?
> 3. Can be potentially restored on any graft node or supernode with the probability of supernode activity.
It is unclear to me what this means. If you mean that any supernode can obtain the same list given the same payment ID, then this is just point 1 again (and is not true because the list is not consistent). If it means that the SN sample can be verified by some other node then it is similarly wrong: there is neither the temporal data (which SNs were valid at block X?) nor the sample consistency that would be required to perform such verification.
---
## Discussion Thread
### Comment by @bitkis
**Date:** 2019-01-04
Bad wordings and not quite accurate/missed explanations on our side.
We've made some modifications to the document, hopping now it explains things better. Please take another look at those.
P.S. Happy New Year Jason :)
---
### Comment by @jagerman
**Date:** 2019-01-04
The edits don't really address my concerns. To summarize:
- the list isn't completely consistent because it depends on announces being received, but announces can arrive and expire at different times on different nodes.
- The list can change *even for a single SN* during a transaction lifetime if one of the SNs selected in the auth sample reaches an expiration threshold. (For example: if you have a N-second expiration and the payment includes an auth sample node with N-2 seconds to expiry).
> RTA Payment ID is unique since PoS Proxy needs a new one-time identification key, as well as an RTA payment ID, for each RTA payment;
- because the RTA payment ID is based on a random value generated by a single component on the network (i.e. the PoS proxy), this means that network component can be modified to choose their own supernodes: you just modify the code to keep generating one until you get one that you like (i.e. one that selects several of your own supernodes). For example, when you need to generate a payment ID, spend half a second generating them and choose whichever one selects more of your own SNs.
- That issue actually doesn't even matter in the current proposal, however, because with the lack of total consistency there is no way that other graft nodes or supernodes *can* reliably verify a supernode sample: network speed differences, momentary network lapses that miss announcements, time synchronization, the passage of time, and offline supernodes coming online *all* affect the pool from which the auth sample is drawn. In order to verify an auth sample selection the verifying supernode needs to be able to ask the question "what was the correct sample at the time this payment was initiated?" but it can't ask that because there is neither a history nor a guaranteed-consistent list across the network, and so it can't verify. Since it can't verify, the POS proxy can just choose its own because the network can never prove that that *wasn't* the correct sample for than SN at that time.
Edit: another example where this inconsistency will matter is on SN restarts. If I restart my proxy SN then it will, until a full announce cycle has passed, have a very different view of active nodes on the network. Is the network just going to simply reject any POS payments that get submitted to a freshly restarted POS proxy, because they will have the wrong signatures? Or will initiated payments just fail for the first couple of minutes until the POS proxy is brought back up to the (roughly) common state? Both outcomes are *terrible*, but the only way to avoid them is either throw away validity (in which case SNs game the system) or to use something more like the blockchain synchronization mechanism that I suggested in #187.
---
### Comment by @bitkis
**Date:** 2019-01-07
Thank you Jason. It appeared some important information was still missed in the document at the time you reviewed it. Sorry about that.
To summarize, the whole idea is to allow inconsistency such that the index of a SN - auth sample participant varies within some known range.
> because the RTA payment ID is based on a random value generated by a single component on the network (i.e. the PoS proxy), this means that network component can be modified to choose their own supernodes: you just modify the code to keep generating one until you get one that you like (i.e. one that selects several of your own supernodes). For example, when you need to generate a payment ID, spend half a second generating them and choose whichever one selects more of your own SNs.
Hmm... half a second, really? :) We're talking about finding a strong hash collision here
Regarding the restart example: yes, you proxy SN would need to wait a full announce cycle to start processing the payments. Terrible? But wait, isn't a blockchain node useless until it complete synchronizing its blockchain? :)
---
### Comment by @jagerman
**Date:** 2019-01-07
> Hmm... half a second, really? :) We're talking about finding a strong hash collision here
There must be something else missing, then, from your description. I'm assuming that the proxy SN generates the payment ID. If I want to cheat the system, I just generate many payment IDs and the resulting hashes well in advance (e.g. using a GPU) and then, when I process an RTA transaction, I choose whichever pre-hashed value selects more of my own auth SNs. No hash collision is involved. If you move the payment ID generation to the POS terminal, instead, then the POS terminal gets to do the cheating.
I'm more concerned, now that you point it out, about the use of a slow hash here: that's adding a huge computational load on the network for handling RTA transactions and is going to cut the maximum potential RTA TPS of the network by something like 40x. It's also entirely unclear whose job it is to validate them, and what happens if they fail validation.
I'm also unclear how it will enter the network consensus rules since there will be *different* consensus rules on different nodes and at different times, and thus identical data can potentially cause a chain split. It seems as though this could be used to deliberately attack the network: create RTA transactions that uses a barely-valid SN auth sample until the network splits due to slightly different visibility on different parts of the network.
I can only conclude that it *won't* be part of the network consensus rules, but that means I'm back to being able to manipulate it: i.e. have my own proxy SN use my own 8 RTA SNs which will be modified to be perfectly happy to lie about being selected into an invalid sample.
> Terrible? But wait, isn't a blockchain node useless until it complete synchronizing its blockchain? :)
A node restart takes around 5 seconds on a decent machine, and only very rarely has to resync anything (and if it does, it's typically just one block). You're talking about something that is going to take 13 (5s + 1m) to 121 (5s+10m) times as long. 5 seconds of downtime is manageable, a minute (or 10 minutes) of downtime is not even comparable.
---

View file

@ -0,0 +1,254 @@
# Issue #191: [RFC 003 RTVF] RTA Transaction Validation Flow
## Reception Score
| Score | Reason |
|-------|--------|
| **ACTIVE** | Open with discussion |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @jagerman |
| Created | 2019-01-10 |
| Closed | N/A |
| Labels | |
| Comments | 8 |
---
## Original Post
**Author:** @jagerman
Comments. Two major, a few smaller issues.
# Privacy leakage.
This design leaks privacy to the PoS proxy, the auth sample, and the wallet proxy. To quote from https://www.graft.network/2018/11/21/how-graft-is-similar-to-and-at-the-same-time-different-from-visa-and-other-payment-card-networks-part-2/
> This property is **absolute privacy** provided by GRAFT Network to both buyer and merchant. Unlike plastic cards and most cryptocurrencies, GRAFTs sender address, recipient address, transaction amount, and transaction fee amount are invisible to everyone except for the sender and recipient themselves.
This design, however, does not accomplish that: the PoS proxy is able to identify all payments received by the PoS, and all SNs involved in the transaction see the amount sent (even if they can't see the recipient address).
A cryptocurrency that is only private as long as you have to trust a single party (the PoS proxy) is no longer a privacy coin.
But it gets worse: from the description in the RFC it is possible for various network participants other than the receiving and paying wallets to get "serialized payment data" which consists of "serialized payment data list of purchased items, price and amount of each item, etc.".
So, to summarize the privacy leaks that seem to be here:
- the PoS proxy SN sees the recipient wallet address, the total amount, and individual items purchased including the amount of each item.
- auth sample SNs see the total amount including the amount received by the proxy PoS
- wallet proxy SN plus, apparently, *any* SN can get an itemized list of the transaction
# Other comments
- this design has no protection against a selfish mining double-spending attack. Unlike a double-spending attack against an exchange, double-spending here does not have to reach any minimum number of confirmations; *and* can be timed (with a little effort) to not even require 51% of the network. (I pointed this out just over two months ago in the public JIRA with details of how to carry out an attack *and a demo* but the issue has had no response).
(`4. Regular key image checking (double spent checking.)` does nothing against the above attack: the key image *isn't* spent on the network visible to the SNs until the private block is released.)
- The PoS <-> PoS proxy SN communication layer should be encrypted so that the PoS can verify it is talking to the expected party (since the PoS in this design has to be trusted with all RTA payment data). This should require HTTPS (with certificate validation enabled), or something similar, both to encrypt the data against MITM snooping, but also importantly to avoid someone spoofing the PoS proxy connection to send false authorization updates back to the PoS.
> 10. Each supernode from auth sample and PoS Proxy Supernode ...
There is a huge amount of complexity added here for little apparent reason. You set the success/failure conditions at 6/3 replies so that you have can have a consistent concensus among the SNs, which I understand, but you don't *need* this success/failure concensus when you have a single party that is in charge: the PoS proxy.
If you simply changed the rules so that the PoS proxy is always the one to distribute the block, you would simplify the traffic (SN auth sample results can be unicast to the PoS proxy, and the payment success can simply be a state variable that never needs to be broadcast over the network), but more importantly you would allow a 6/1 success/failure trigger without incurring any consistency problem.
> ii. Transaction considered to be rejected in the case at least 3 out of 8 auth sample members or PoS Proxy rejected it.
Allowing 2 failures is a recipe for fee cheating: hack your wallet to reduce two of the eight SN fees to zero (or just leave them out) in every transaction to give yourself a small rebate.
> iii. When any auth sample supernode or PoS Proxy Supernode gets in:
What happens if there are 5 successes, 2 failures, and one timeout?
> Graftnode that handles RTA transaction validates:
> i. Correctness of the selected auth sample;
Which is done how, exactly? In particular, how much deviation from what it thinks is correct will it allow? This needs to be specified.
> 12. Once the graftnode accepts the transaction, supernode, which submitted it to the cryptonode, broadcasts successful pay status over the network
Why is this needed at all? Success can already been seen (and is already transmitted across the network) by the fact that the transaction enters the mempool. Can't the wallet just check for that instead?
# This design is non-trustless!
This design puts far too much centralized control in the hands of the proxy SN. The design here puts this single node as RTA transaction gatekeeper, with the possibility to lie to the PoS about transaction validity—a lie here could be deliberate, or could be because the proxy SN in use was hacked. This is not how a decentralized cryptocurrency should work: it needs to be possible to trust no one on the network and yet have the network still work.
A non-trustless design like this should be a non-starter.
---
## Discussion Thread
### Comment by @softarch24
**Date:** 2019-01-11
Regarding "Privacy leakage" and "This design is non-trustless" comments -
Yes, the proxies have some insight on details of payments (note - we are talking about merchant payments, not regular P2P transfers). The idea behind proxy is that it takes care of some operations that are difficult or impossible to implement on mobile device, especially with tough requirements of CryptoNote protocol. The proxy is somewhat trusted; however, it can be either public (as a service provided by trusted third party service provider to multiple merchants) or proprietary (as a local supernode that belongs to the single merchant). For most merchants, it is more important to get best levels of service than absolute privacy. In case absolute secrecy is required, the merchant can run its proprietary proxy.
---
### Comment by @softarch24
**Date:** 2019-01-11
Regarding "selfish mining double-spending attack" -
This is known attack on PoW blockchains called "Finney attack": https://bitcoin.stackexchange.com/questions/4942/what-is-a-finney-attack
GRAFT is not the only PoW blockchain that is vulnerable to this attack.
For RTA, we are going to implement locking mechanism similar to the one implemented by DASH. Once RTA Tx is authorized by the authorization sample, the Tx is broadcasted to the entire network. If an attacker injects a block (or chain) containing Tx that conflicts with the locked Tx (i.e. trying to spend the same key images), such a block (or chain) will be rejected (see section 4.2 Finney Attacks):
https://github.com/dashpay/docs/blob/master/binary/Dash%20Whitepaper%20-%20Transaction%20Locking%20and%20Masternode%20Consensus.pdf (see
In addition, DASH has recently suggested another protection mechanism that mitigates 51% mining attack even on regular (non-instant) Tx, which essentially makes even a regular transfer transaction irreversible after 1 confirmation:
https://github.com/dashpay/dips/blob/master/dip-0008.md
We are weighing our options of implementing a similar mechanism in the future.
---
### Comment by @jagerman
**Date:** 2019-01-12
> Yes, the proxies have some insight on details of payments (note - we are talking about merchant payments, not regular P2P transfers).
It is unnecessary and undermines the privacy that less than two months ago [you posted about](https://www.graft.network/2018/11/21/how-graft-is-similar-to-and-at-the-same-time-different-from-visa-and-other-payment-card-networks-part-2/) as being a key difference in the GRAFT payment network:
> ### Difference #2 Privacy
> Another key difference is ... absolute privacy provided by GRAFT Network to both buyer and merchant. Unlike plastic cards and most cryptocurrencies, GRAFTs sender address, recipient address, transaction amount, and transaction fee amount are invisible to everyone except for the sender and recipient themselves. Although payment card networks do not expose the details of transaction to the public, this data is accessible by employees of multiple corporations, can be shared with governments, and can be stolen by hackers.
But now you are saying:
> For most merchants, it is more important to get best levels of service than absolute privacy.
And that merchants who actually want the proclaimed privacy will have to have the expertise to run, update and keep secure their own proxy SN.
> The idea behind proxy is that it takes care of some operations that are difficult or impossible to implement on mobile device, especially with tough requirements of CryptoNote protocol.
What operations, exactly, do you think cannot be done on mobile hardware? Are you not aware of mobile wallets for several cryptonote coins such as [monerujo (for Monero)](https://play.google.com/store/apps/details?id=com.m2049r.xmrwallet&hl=en), [Loki Wallet](https://play.google.com/store/apps/details?id=network.loki.wallet&hl=en_US), or [Haven Protocol Wallet](https://itunes.apple.com/us/app/haven-protocol-wallet/id1438566523?ls=1&mt=8), to name just a few, which are able to handle CryptoNote just fine without leaking privacy and security to a remote proxy? Or that a Raspberry Pi (which has essentially the same computational power as the slowest Verifone Carbon device) is perfectly capable of running not only dozens of CryptoNote wallets simultaneously, but also multiple whole cryptonode nodes simultaneously?
> The proxy is somewhat trusted
No, it is not "somewhat" trust. It is entirely trusted. In this design, the proxy SN is the one that tells the merchant *without verifiable proof* that a payment has been approved by the network. It is a huge target for attacks and said attacks will be difficult to detect until long after the fact. This single point of attack effectively undermines the entire security of the RTA mechanism, to the point where you might as well not even *have* RTA: you could literally do the entire authorization in just the proxy SN and have just as much security as you are getting here because your weakest link would be the same.
The entire point of using a random sample on a decentralized network is the security it brings, because someone would have to own or compromise a very large share of the network in order to compromise the security of the network. Hacking an RTA supernode or coercing its operator would gain you absolutely nothing. The design in this RFC, however, specifies a trusted, centralized component that must exist in every single RTA transaction; a component that can be hacked or have its operator coerced to compromise the security and privacy of any and all merchants using that node.
This is not an responsible or acceptable design.
---
### Comment by @SomethingGettingWrong
**Date:** 2019-01-12
**RTA OF ANY PRIVACY CRYPTO SHOULD BE PRIVATE**
The privacy of any crypto is the number one community backed assumption and choice that a project should take the steps to complete when they support it! Otherwise you should have just forked Dash! which was based off of bitcoin.
Just because It technically works at RTA doesn't mean you will have the support of the community. If the community doesn't support it then the price will dump to the cost of mining it! which will further go down as difficulty lowers as miners leave as the price drops!
*What you are trying to achieve could have been achieved; while , at the same time staying private.*
I fear that you thought privacy had to be sacrificed in order to make it compatible with merchants terminals. When indeed that is not the case! I feel this came about from a lack of understanding the actual fundamental privacy of the Monero blockchain and from not listening to the community who was practicly screaming! Please Please Please don't implement it this way!
Now you have "completed" an Alpha that while technicly does RTA yet it has no privacy and is insecure with a central failure point the proxy supernode. Which by definition means its not decentralized
**You guys are busy implementing all these new features working on them all at one time! Instead of just sticking to something the community would have wanted and what we thought it was!**
**A Privacy/RTA coin.**
You guys are programming this as if no one will modify super node code for nefarious purposes! All the risk is left on the super nodes running this code! While we would be okay with that if it was all anonymous/secure. The fact of the matter is your leaving it unprivate and and insecure and leaving the burden of running the code on the users and their stake amount while telling everyone its private!
maybe if you would have not been so in the dark about it's development and decisions and had more community involvement the project would corrected itself!
**You had plenty of opensource developers who would have helped you if you would have just listend and done it a different way. Instead you thought it could only be done this way. when we are telling you if you do it this way your making a mistake**
You are running it as if its closed source software! That mentality has caused you to sacrifice the security and privacy when programming. Instead of actually listening to the community you pushed your community developers away. Just because you know how to program and you understand Merchant terminals doesn't mean you comprehend privacy blockchain! If you do and you implemented this anyway "SHAME ON YOU"
_All your answers are we are right you are wrong and this is why! or you say.. I don't see the issue can we close this?_
Reading this code has me baffled! Its not even the programmers. I feel its the way the team is telling them to implement it and I feel the team doesn't realize this is a mistake and are in denial because they have spent so much time going this direction!
Its not too late to turn around yah know! The direction you are taking this is away from the community.. which means no one will use it! Have you not noticed community is dissolving?
---
### Comment by @necro-nemesis
**Date:** 2019-01-13
RTA must have end to end encryption for the protection of node owners. Zero knowledge proof of knowledge. Disclosing information to a node presents unlimited liability for whomever operates it. Anyone who understands this will not operate a node since the risks greatly outweigh the benefits.
---
### Comment by @SomethingGettingWrong
**Date:** 2019-01-17
@sgomzin
Please create your own unique algo or "tweak" another algo that's lesser known like XTL or Haven.
(more gpu's can support xtl variant) but at this point a v8 tweak would be fastest
**STOP WEIGHING YOUR OPTIONS AND PICK ONE!**
**[P2P6] INFO global src/cryptonote_core/blockchain.cpp:933 REORGANIZE SUCCESS! on height: 263338, new blockchain size: 263442**
Any top exchange would delist! It would not surprise me if Cryptopia and Tradeogre
delists you guys.
You need to reevaluate your understanding of a 51 percent attack!
I warned him.. we will see how it goes. (not looking good)
The blockchain should have a checkpoint every few blocks or something when below such a hashrate. I cant think of any situation where you would need to reorganize more then 20 blocks.
![image](https://user-images.githubusercontent.com/36722911/51296184-75b9f280-19e0-11e9-9ce9-7741896a567c.png)
---
### Comment by @bitkis
**Date:** 2019-01-19
@jagerman Thanks for the valuable and constructive criticism.
> So, to summarize the privacy leaks that seem to be here:
>
> * the PoS proxy SN sees the recipient wallet address, the total amount, and individual items purchased including the amount of each item.
> * auth sample SNs see the total amount including the amount received by the proxy PoS
> * wallet proxy SN plus, apparently, any SN can get an itemized list of the transaction
The RFC is updated, we tried to address most of the concerns. Note that though the total amount is still open, no association between transaction and recipient wallet address can be built.
> this design has no protection against a selfish mining double-spending attack. Unlike a double-spending attack against an exchange, double-spending here does not have to reach any minimum number of confirmations; and can be timed (with a little effort) to not even require 51% of the network. (I pointed this out just over two months ago in the public JIRA with details of how to carry out an attack and a demo but the issue has had no response).
We know it's an open issue and still weighing our options here.
> > 12. Once the graftnode accepts the transaction, supernode, which submitted it to the cryptonode, broadcasts successful pay status over the network
> Why is this needed at all? Success can already been seen (and is already transmitted across the network) by the fact that the transaction enters the mempool. Can't the wallet just check for that instead?
It's a work around the fact we could often observe mempool sync required extra time.
---
### Comment by @SomethingGettingWrong
**Date:** 2019-01-21
@bitkis What options are you weighing? Super node consensus seems to be the way dash and Loki are handling similar things. I would do something similar.
---

View file

@ -0,0 +1,120 @@
# Issue #2: Disqualification Flow
## Reception Score
| Score | Reason |
|-------|--------|
| **ACTIVE** | Open with discussion |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @bitkis |
| Created | 2019-03-26 |
| Closed | N/A |
| Labels | |
| Comments | 3 |
---
## Original Post
**Author:** @bitkis
Discussion placeholder for [[RFC-005-DF]-Disqualification-Flow](https://github.com/graft-project/DesignDocuments/blob/disqualification-flow/RFCs/%5BRFC-005-DF%5D-Disqualification-Flow.md)
---
## Discussion Thread
### Comment by @jagerman
**Date:** 2019-03-29
This is an algorithm description rather than a design document.
As far as the underlying design here goes, this seems overbuilt. What is the point of a high level of complexity here? Wouldn't it be far simpler to use a random quorum that votes on a random selection of supernodes, using a very simple rejection rule such as "no more than 3 missed authorizations in the last 720 blocks", and if the threshold is hit, submits *one* signed disqualification tx that kicks out the malfunctioning SN? Why complex scores, extra data storage lists, and loads of magic numbers in calculations (such as: `0.5 + (DTBlockNumber - BDListBlockNumber) / (2 * (BlockHeight - BDListBlockNumber))`) of any benefit to the objective here?
Some particular things that jump out at me:
> - AAoS - Accumulated Age of stake - The value determines the reliability of the stake, based on the stake amount, number of blocks, passed after stake activation (as usual AoS) and average disqualification score (ADS), AoS = StakeAmount * StakeTxBlockNumber * (1 - ADS).
First, this is nonsense: there is no reason at all to suppose that T4 is 5 times as reliable as a T1, or that someone who stakes for a month at a time is (on average) 4 times as reliable as someone who stakes for a week at a time.
Second, this significantly undermining the integrity of the system, which relies on uniform random sampling. By introducing controllable bias (i.e. use larger and longer stakes to greatly increase your chance of being selected) you weaken the security of the system.
> Gets first PBLSize bytes from the split block hash and selects PBLSize supernodes from it, using these one-byte numbers as indexes.
I honestly feel like I'm personally being trolled with this. Using 1 byte of entropy for one random value is a *horrible* solution for anything that needs to be random other than something that needs exactly the range of one byte. Please read over https://github.com/graft-project/GraftNetwork/pull/225 again.
---
### Comment by @bitkis
**Date:** 2019-04-04
@jagerman,
Let's hit on the common ground first:
> Wouldn't it be far simpler to use a random quorum that votes on a random selection of supernodes,
The quorum should be both random and verifiable, and all members of the quorum should be able to agree on the selection, correct?
> using a very simple rejection rule such as "no more than 3 missed authorizations in the last 720 blocks",
I assume you meant blockchain-based verification. So, do you suggest to go through all the RTA transactions in the last 720 blocks, reconstruct authorization samples for each of those, check if any of the randomly selected supernodes, mentioned above, missed participation in the corresponded samples? It doesn't look very simple. Also, what if an RTA transaction didn't make it to the black chain due to the malfunctioning supernode(s)?
> and if the threshold is hit, submits one signed disqualification tx that kicks out the malfunctioning SN?
Seems like you suggest skipping health checking ("pinging"), and kicking out the malfunctioning supernodes reactively, after harm has been already done. Is this correct?
> Why complex scores, extra data storage lists, and loads of magic numbers in calculations (such as: 0.5 + (DTBlockNumber - BDListBlockNumber) / (2 * (BlockHeight - BDListBlockNumber))) of any benefit to the objective here?
It was just an idea and we are to discuss it here. In general, we consider simplification of the process but the current concept attempts to make (1) assessment of auth sample work, since it can not always submit transaction (for example, auth sample does not get enough approvals) and we cannot check it using blockchain, (2) real-time network state estimation, "pinging" allows us to check health of supernodes in next Blockchain-based lists.
Current score schema is more complex than we'd like it to be but it allows us to take into consideration the age of disqualification transaction, since historical data cannot directly define the state of supernode but still provides important information of supernode's behavior.
> First, this is nonsense: there is no reason at all to suppose that T4 is 5e times as reliable as a T1, or that someone who stakes for a month at a time is (on average) 4 times as reliable as someone who stakes for a week at a time.
Yes, T4 is not more reliable as a T1, and in the process of building Blockchain-based list, different tiers form different lists (see new revision of the document.) However, we still need verifiable order for supernodes and Age of stake is suitable for that.
> Second, this significantly undermining the integrity of the system, which relies on uniform random sampling. By introducing controllable bias (i.e. use larger and longer stakes to greatly increase your chance of being selected) you weaken the security of the system.
In our opinion, a long-term stake is more reliable for a sole reason: if the corresponding supernode misbehaved and got disqualified, the stake will stay locked for a longer time. So an owner of the longer stake will be punished worse then an owner of a shorter one.
> I honestly feel like I'm personally being trolled with this. Using 1 byte of entropy for one random value is a horrible solution for anything that needs to be random other than something that needs exactly the range of one byte. Please read over graft-project/GraftNetwork#225 again.
Sorry, we missed to update the document properly. Updated now.
---
### Comment by @jagerman
**Date:** 2019-04-05
> The quorum should be both random and verifiable, and all members of the quorum should be able to agree on the selection, correct?
Yes. This is why you seed a common RNG using common data such as the block hash at the height being considered.
> Seems like you suggest skipping health checking ("pinging"), and kicking out the malfunctioning supernodes reactively, after harm has been already done. Is this correct?
No, I suggest it in addition to a health check (but any such health check needs to be far more reliable than the current random mess where there is a non-negligible chance of false positive failures due to the randomness of announce forwarding).
A SN could be disqualified either because it did not stay up, or because it failed to complete authorizations.
> So, do you suggest to go through all the RTA transactions in the last 720 blocks, reconstruct authorization samples for each of those, check if any of the randomly selected supernodes, mentioned above, missed participation in the corresponded samples?
Yes. Network rules must be enforced via concensus. Right now you don't have any sample enforcement of RTA signatures in the design; this seems like a logical place for it. Alternatively you could put it at the blockchain concensus layer (i.e. in graftnoded), and do active rejection of blocks with invalid samples, but that seems more complicated and would slow regular nodes down considerably.
> In our opinion, a long-term stake is more reliable for a sole reason: if the corresponding supernode misbehaved and got disqualified, the stake will stay locked for a longer time. So an owner of the longer stake will be punished worse then an owner of a shorter one.
So why allow shorter stakes *at all*? If longer stakes are considered in your opinion to be more reliable, why would you ever want to allow shorter stakes (i.e. less reliable nodes) on the network? Have fixed period (e.g. 30 day) more reliable stakes for everyone, or copy Loki's infinite stakes with long penalty periods (30 day continue lockup of stake) upon disqualification.
---

View file

@ -0,0 +1,131 @@
# Issue #182: Authorization Sample Selection Algorithm
## Reception Score
| Score | Reason |
|-------|--------|
| **ACTIVE** | Open with discussion |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @jagerman |
| Created | 2018-12-21 |
| Closed | N/A |
| Labels | |
| Comments | 4 |
---
## Original Post
**Author:** @jagerman
https://github.com/graft-project/graft-ng/wiki/%5BDesign%5D-Authorization-Sample-Selection-Algorithm comments on the design of the supernode sample selection. I have some comments/questions about the algorithm.
Most importantly, I have to ask: why *this* approach instead of some other approach?
I see some downsides that I'll get into, but this RFC (and the others) feel like they are simply describing what *is* being done rather than *why* it was chosen or is needed. I can guess some of that, of course, but it would be quite valuable to have it written down why this aspect of the design was chosen to be the way it is.
What the algorithm describes is effectively uniform random sampling done in a deterministic way via a recent block hash and supernode public keys (whether the wallet public keys via the wallet address, or using a separate SN-specific public key as I suggest in https://github.com/graft-project/graft-ng/issues/176#issuecomment-446060076 doesn't really matter).
The big problem I see with this approach is this:
### Uniform random sampling leads to an enormously variable distribution of SN rewards.
Assuming a (long run) 50% supernode lock-in, with about 50% of the that going into T1 supernodes, we get somewhere around 9000 T1 supernodes expected on the network (once near maximum supply).
Thus, with this pure random selection formula, each T1 supernode would have a probability of `1 - (8999/9000)^2` (approximately 0.000222) of being selected in any block.
This in turn implies that there is only about a 14.7% chance of getting selected into the auth sample for at least one block in a day, and only a 67.4% chance of getting at least one auth sample entry in a week.
If your SN is online for 2 weeks, you still have slightly more than 10% chance of never being in the auth sample, and a 3.5% chance of never being in the auth sample after having your SN up for 3 weeks.
When considering get into the auth sample at least twice, the numbers are worse:
- 1.1% chance of getting 2+ auth samples in a day
- 30% chance of getting 2+ auth samples in a week
- 65.5% chance of getting 2+ auth samples in 2 weeks
- 95% chance of getting 2+ auth samples in a month
When you also consider the exponential distribution of block times, things look worse still because of the distribution of block times:
- 1.4% get less than 15 seconds of auth sample time per month
- 2.0% get between 15 and 60 seconds of auth sample time per month
- 3.9% get [1,2) minutes/month
- 5.1% get [2,3) minutes/month
- 6.0% get [3,4) minutes/month
- 6.6% get [4,5) minutes/month
- 7.0%, 7.0%, 6.9%, 6.6%, 6.2% get [5,6), [6,7), [7,8), [8,9), [9,10) minutes/month
- 5.7, 5.2, 4.7, 4.0, 3.6, 3.1, 2.6, 2.2, 1.9, 1.6% for [10,11) through [19,20)
- 5.9% get 20-30 minutes of auth time per month
- 0.6% get more than 30 minutes of auth time per month
If we then consider RTA earnings, the distribution becomes considerably more unequal still because of variation in the timing and amounts being spent. The above represents a "best case" distribution where RTA payment amounts are constant, very frequent, and perfectly spread out over time.
I've deliberately chosen a 30-day timescale above because I believe that it is about as far as one can reasonable go while thinking that rewards will "average out." As you can see above, though, they aren't averaging out in a reasonable time frame: even if RTA traffic was perfectly spread over time and for a constant amount, we have the top 10% of tier-1 SNs (ranking by auth sample time) earning seven times what the bottom 10% earns.
This sort of risk in reward distribution seems undesirable for potential SN operators and is likely to create a strong motivation for SN pooling--thus inducing centralization on the SN side of the network in the same way we have centralization currently among mining pool operators.
In Dash there is some randomness to MN selection, but it is strongly biased towards being a much fairer distribution: there is a random selection only from MNs that have not been one of the last 90% of MNs to earn a reward. Unlike Graft, the reward is simply a portion of the block reward, so there is no extra time-dependent or transaction volume-dependent components to further spread out the distribution. Loki is similar, but perfectly fair: SNs enter a queue and receive a payment when they reach the top.
One key distinction of Graft compared to both Dash and Loki, however, is that MN/SN sample selection in Dash/Loki is completely independent of MN/SN rewards. In Loki, for example, there are performance metrics that a SN must satisfy or risk being deregistered (and thus losing rewards until the stake expires). Dash, similarly, requires that MNs participate in network operations to stay active, foregoing any reward potential if they fail a network test and become inactive.
Neither of these are directly applicable to Graft, given the percentage nature of fees, but I feel that given the highly erratic nature of SN rewards that I laid out above this needs to be addressed. Either a change to improve the fairness of SN rewards, or at least a solid explanation of why a fairer distribution of earnings isn't feasible.
Just to throw out a couple of ideas for discussion:
- have 5 queues (one queue for each tier plus a proxy SN queue). Require that 0.5% of all RTA payments be burned, then remint some fraction (say 0.1%) of all outstanding burnt, non-reminted fees in each block and send an equal portion to the SN at top of each queue, returning that SN to the bottom of its queue. Use network-assessed performance requirements to deregister (via a quorum) any SN with poor performance.
- Use 5 queues, as above, but just drop the RTA fee entirely and instead award SNs a constant fraction of the block reward (say 50%), combined with a meaningful tail emission (this could be one that declines over time until it hits a fixed level, or just a switch to an outright fixed emission level).
---
## Discussion Thread
### Comment by @Fez29
**Date:** 2018-12-21
A more reliably consistent/fairer reward distribution is desirable and makes sense.
Potential SN operators would be much more likely to join the network if there was some sort of uniformity to rewards.
Especially if it encourages a more decentralised network and more SNs on the network.
The least complicated ways of achieving this should be seriously considered.
Regarding network assessed SN performance requirements - I do think this has value and could be used due to the fact that RTA is dependant on SNs response time and consistent up time especially if placed in a queue. As the Real Time Auth response time would obviously be a factor as it would be desired to be as short as possible or within some sort SLA. And SN performance requirements should reflect this but also take into account geographical differences to try promote an even distribution in location as well
---
### Comment by @Swericor
**Date:** 2018-12-22
Very interesting thoughts, I share your view that a more consistent reward system is needed.
I think however that delisting SNs due to poor performance is a bit harsh, especially if the que will be weeks long. Poor performing SNs could be shifted back one or a few steps in the que each time another SN has performed an auth and drops to the bottom of the que.
---
### Comment by @jagerman
**Date:** 2018-12-23
> Require that 0.5% of all RTA payments be burned, then remint some fraction
Thinking about this some more, this really won't fly while keeping RTA amounts secret. (But on that note: a percentage-based fee for RTA payments doesn't allow for keeping RTA amounts secret in the first place).
---
### Comment by @Swericor
**Date:** 2018-12-26
Dropping a few steps in the que (for each newly processed block) would be a better incentive to get the SN online again asap. If you're immediately delisted, the offline-time doesn't really matter.
---

View file

@ -0,0 +1,797 @@
# PR #225: Blockchain based list implementation
## Reception Score
| Score | Reason |
|-------|--------|
| **MERGED** | Contribution accepted |
---
## Metadata
| Field | Value |
|-------|-------|
| State | MERGED |
| Author | @LenyKholodov |
| Created | 2019-02-04 |
| Merged | 2019-03-05 |
---
## Description
Blockchain based list is used for building list of supernodes which may be used for further authentication.
Implementation details:
* list is built for every block based on it's hash and active stake transactions;
* block hash is used as a bye array for selecting supernodes from active supernodes (in terms of stake validity time);
* the list is stored to file after each update;
* the list is loaded during cryptonode start from a file (if it exists).
---
## Reviews & Comments
### Comment by @jagerman
The sample selection being done here to select a blockchain-based supernode tier subset is non-uniform, and results in relatively small samples. It is also entirely non-obvious why these lists are being reduced to a random subset in the first place.
To deal with the latter issue first: with a hard cap on the number of supernodes selected into a sample you are effectively limiting the scalability of the network. More supernodes active at a time will add no additional capability to the network because at each block you cut down the list of supernodes that are available to handle SN operations. Why is this being done? If you were to pass the entire list of active supernodes on each tier to the supernode and let it randomly sample from that list (based on the payment ID) it would be far more scalable.
Now as for the former issue. Since the source vector from which elements are sampled is itself sorted by the age of the stake, this whole process results in non-uniform selection: some supernodes have a greater chance of selection than others (and depending on the counts, some have no probability of being selected at all). For example, when you have 50 supernodes on a tier you get `PREVIOS_BLOCKCHAIN_BASED_LIST_MAX_SIZE` selected from the previous block list (why?), plus another 32 selected from using the randomization algorithm (since you are using the `char` of the block hash as your RNG, and only have 32 `char`s to work with). When I use your algorithm to look at the frequency of selection of the 50 nodes, I get this:
```
Selection frequency: (uniform frequency: 0.64):
[ 0]: 0.715325
[ 1]: 0.714514
[ 2]: 0.719117
[ 3]: 0.723792
[ 4]: 0.727855
[ 5]: 0.731591
[ 6]: 0.734153
[ 7]: 0.73704
[ 8]: 0.738946
[ 9]: 0.741059
[ 10]: 0.742394
[ 11]: 0.743742
[ 12]: 0.744824
[ 13]: 0.745515
[ 14]: 0.746299
[ 15]: 0.746988
[ 16]: 0.690373
[ 17]: 0.671085
[ 18]: 0.658806
[ 19]: 0.65022
[ 20]: 0.643962
[ 21]: 0.639378
[ 22]: 0.635563
[ 23]: 0.633008
[ 24]: 0.630666
[ 25]: 0.629243
[ 26]: 0.628241
[ 27]: 0.627435
[ 28]: 0.57412
[ 29]: 0.547461
[ 30]: 0.531217
[ 31]: 0.520952
[ 32]: 0.513832
[ 33]: 0.509343
[ 34]: 0.506473
[ 35]: 0.504151
[ 36]: 0.502728
[ 37]: 0.501716
[ 38]: 0.561549
[ 39]: 0.584621
[ 40]: 0.59685
[ 41]: 0.604984
[ 42]: 0.610537
[ 43]: 0.614386
[ 44]: 0.61711
[ 45]: 0.618959
[ 46]: 0.62066
[ 47]: 0.621801
[ 48]: 0.622307
[ 49]: 0.623108
```
(These values are based on 10M repetitions of the algorithm, where each `extract_index` uses a value drawn from `static std::uniform_int_distribution<char> random_char{std::numeric_limits<char>::min(), std::numeric_limits<char>::max()};`. Typical variation across runs here is in the 4th decimal place: this is not a sampling aberration.)
This is very clearly not a uniform distribution: the 15th-oldest supernode has almost 50% higher probability of being selected compared to the 38th oldest.
For other supernode numbers things get worse; here's the sampling frequency when there are 250 supernodes on a tier:
```
[ 0]: 0.24291
[ 1]: 0.24728
[ 2]: 0.249168
[ 3]: 0.249518
[ 4]: 0.249791
[ 5]: 0.250054
[ 6]: 0.250062
[ 7]: 0.24979
[ 8]: 0.249791
[ 9]: 0.249997
[ 10]: 0.249981
[ 11]: 0.249963
[ 12]: 0.250104
[ 13]: 0.249791
[ 14]: 0.250034
[ 15]: 0.250051
[ 16]: 0.250057
[ 17]: 0.250055
[ 18]: 0.249884
[ 19]: 0.25012
[ 20]: 0.250039
[ 21]: 0.250088
[ 22]: 0.250208
[ 23]: 0.250117
[ 24]: 0.250177
[ 25]: 0.249837
[ 26]: 0.249773
[ 27]: 0.249865
[ 28]: 0.250205
[ 29]: 0.250166
[ 30]: 0.250068
[ 31]: 0.249756
[ 32]: 0.249978
[ 33]: 0.24987
[ 34]: 0.250209
[ 35]: 0.249829
[ 36]: 0.250101
[ 37]: 0.250132
[ 38]: 0.250032
[ 39]: 0.24971
[ 40]: 0.249928
[ 41]: 0.249834
[ 42]: 0.250064
[ 43]: 0.250113
[ 44]: 0.250229
[ 45]: 0.249869
[ 46]: 0.249862
[ 47]: 0.250021
[ 48]: 0.249953
[ 49]: 0.250074
[ 50]: 0.250051
[ 51]: 0.249851
[ 52]: 0.249894
[ 53]: 0.249789
[ 54]: 0.24987
[ 55]: 0.250084
[ 56]: 0.249922
[ 57]: 0.250097
[ 58]: 0.250028
[ 59]: 0.250173
[ 60]: 0.249823
[ 61]: 0.250085
[ 62]: 0.249914
[ 63]: 0.25002
[ 64]: 0.250072
[ 65]: 0.24988
[ 66]: 0.250086
[ 67]: 0.250092
[ 68]: 0.249764
[ 69]: 0.249885
[ 70]: 0.250143
[ 71]: 0.249959
[ 72]: 0.249907
[ 73]: 0.249892
[ 74]: 0.249984
[ 75]: 0.249953
[ 76]: 0.250395
[ 77]: 0.250094
[ 78]: 0.250099
[ 79]: 0.249982
[ 80]: 0.250033
[ 81]: 0.249815
[ 82]: 0.249907
[ 83]: 0.250006
[ 84]: 0.249939
[ 85]: 0.249977
[ 86]: 0.250034
[ 87]: 0.250029
[ 88]: 0.249932
[ 89]: 0.250139
[ 90]: 0.250167
[ 91]: 0.250096
[ 92]: 0.249912
[ 93]: 0.250008
[ 94]: 0.250053
[ 95]: 0.249949
[ 96]: 0.250287
[ 97]: 0.250034
[ 98]: 0.249838
[ 99]: 0.250176
[100]: 0.250165
[101]: 0.250049
[102]: 0.249944
[103]: 0.250206
[104]: 0.25
[105]: 0.250052
[106]: 0.250005
[107]: 0.250039
[108]: 0.249936
[109]: 0.250015
[110]: 0.249985
[111]: 0.249776
[112]: 0.249764
[113]: 0.250092
[114]: 0.249951
[115]: 0.24985
[116]: 0.134431
[117]: 0.126543
[118]: 0.1252
[119]: 0.125071
[120]: 0.125212
[121]: 0.124933
[122]: 0.124989
[123]: 0.124869
[124]: 0.125012
[125]: 0.125022
[126]: 0.124945
[127]: 0.124973
[128]: 0.0081291
[129]: 0.0003719
[130]: 1.37e-05
[131]: 6e-07
[132]: 0
[133]: 0
[134]: 0
[135]: 0
[136]: 0
[137]: 0
[138]: 0
[139]: 0
[140]: 0
[141]: 0
[142]: 0
[143]: 0
[144]: 0
[145]: 0
[146]: 0
[147]: 0
[148]: 0
[149]: 0
[150]: 0
[151]: 0
[152]: 0
[153]: 0
[154]: 0
[155]: 0
[156]: 0
[157]: 0
[158]: 0
[159]: 0
[160]: 0
[161]: 0
[162]: 0
[163]: 0
[164]: 0
[165]: 0
[166]: 0
[167]: 0
[168]: 0
[169]: 0
[170]: 0
[171]: 0
[172]: 0
[173]: 0
[174]: 0
[175]: 0
[176]: 0
[177]: 0
[178]: 0
[179]: 0
[180]: 0
[181]: 0
[182]: 0
[183]: 0
[184]: 0
[185]: 0
[186]: 0
[187]: 0
[188]: 0
[189]: 0
[190]: 0
[191]: 0
[192]: 0
[193]: 0
[194]: 0
[195]: 0
[196]: 0
[197]: 0
[198]: 0
[199]: 0
[200]: 0
[201]: 0
[202]: 0
[203]: 0
[204]: 0
[205]: 0
[206]: 0
[207]: 0
[208]: 0
[209]: 0
[210]: 0
[211]: 0
[212]: 0
[213]: 0
[214]: 0
[215]: 0
[216]: 0
[217]: 0
[218]: 0
[219]: 0
[220]: 0
[221]: 0
[222]: 0
[223]: 0
[224]: 0
[225]: 0
[226]: 0
[227]: 0
[228]: 0
[229]: 0
[230]: 0
[231]: 0
[232]: 0
[233]: 0
[234]: 0
[235]: 0
[236]: 0
[237]: 0
[238]: 0.117817
[239]: 0.124049
[240]: 0.124957
[241]: 0.125015
[242]: 0.125061
[243]: 0.124996
[244]: 0.125086
[245]: 0.125103
[246]: 0.124908
[247]: 0.124911
[248]: 0.125068
[249]: 0.124864
```
Another strange thing happening in this algorithm is that it never selects more than 32 supernodes for a tier (because there are only 32 `char`s in the block hash), but once there are 256 or more supernodes, you start selecting only 16 per block. (These get added to `PREVIOS_BLOCKCHAIN_BASED_LIST_MAX_SIZE` selected from the previous sample, so technically it is going to build a list of 33 SNs for a tier with up to 255 SNs on it, and 17 SNs for a tier with >= 256).
The `PREVIOS_BLOCKCHAIN_BASED_LIST_MAX_SIZE` also makes no sense here: what is gained by keeping a subset of the previous round's subset in the list of available SNs?
# Why?
I am left asking: why are you doing all of this?
This approach (combined with https://github.com/graft-project/graft-ng/pull/204) results in a non-uniform, hard-capped number of SNs to select from each tier.
You can make a simpler, far more robust, _uniform_ sampling algorithm by just giving the SN *all* of the supernodes on each tier, then using the payment ID to seed a PRNG (like `std::mt19937_64`) and using this to randomly sample from each tier.
That's not ideal, though, because it can be gamed: I could use a supernode to reroll payment IDs until I get one that favours my own SNs. You can work around that fairly easily doing something like this:
1. Don't do any sampling in GraftNetwork; instead just provide the entire list of supernodes currently active at each tier along with the relevant block hash value.
2. Inside graft-ng, generate a payment-id.
3. Hash the payment-id together with the block hash.
4. Use that resulting hashed value to seed a `std::mt19937_64`.
5. Use this RNG to sample 2 supernodes from each tier.
The harder you make step 3 the more costly it is to game the system (but also, the more costly it becomes to verify). The block hash from step 1 is needed in step 2 so that you can't pregenerate lots of payment IDs offline with known SN selection positions in advance.
And all of this is *still* going to be significantly less code than you are using now to generate a badly broken sample.
---
### Comment by @LenyKholodov
Jason, thank you for your feedback. We will check the results you kindly provided and return to you soon.
---
### Comment by @LenyKholodov
> Jason, thank you for your feedback. We will check the results you kindly provided and return to you soon.
@jagerman Could you please repeat your test with following fix?
```
size_t extract_index(const char* it, size_t length)
{
size_t result = 0;
for (;length--; it++)
result = (result << 8) + size_t(*reinterpret_cast<const unsigned char*>(it));
return result;
}
```
---
### Comment by @jagerman
Changing it from a signed to unsigned char gets rid of the hole above 128, but doesn't fix the non-uniformity of the distribution; for 200 nodes it now results in the first few having these probabilities:
```
[ 0]: 0.228301
[ 1]: 0.243768
[ 2]: 0.248024
[ 3]: 0.249059
[ 4]: 0.249682
[ 5]: 0.250019
[ 6]: 0.149295
[ 7]: 0.130186
[ 8]: 0.126137
[ 9]: 0.125245
[ 10]: 0.12497
```
with the remaining 11-249 all being close to 0.125.
---
### Comment by @jagerman
The unsigned results for N=50 show the same pattern: too high selection probability on the first 10-15 elements and slightly too low on the remaining ones.
The reason is pretty simple: `random_value % N` does *not* produce a uniform distribution over [0, *N*-1], though it does get close if *N* is much larger than `random_value` by at least a couple orders of magnitude.
If you absolutely need to construct a deterministic random selection here (but I really don't think you do or *should*--see my comments above) you are best off generating values from a single `std::mt19937_64` that you seed using a `std::uint_fast64_t` value constructed from the hash.
You also need to drop the `offset` addition from `(offset + random_value) % src_list_size`--this is biasing the selection probability away from the first elements (which is why in the above example you see an increase in probabilities over the first few elements).
Actually, on that note, if you absolutely must keep random sampling here (and again, I don't see any reason why you would need this!) I think you should scrap the whole thing and use this far more algorithmically efficient approach to select m of n values with linear (O(n)) complexity (your current implementation looks to me to be O(mn²)): https://stackoverflow.com/questions/136474/best-way-to-pick-a-random-subset-from-a-collection/136730#136730
---
### Comment by @LenyKholodov
@jagerman We have prepared two tests with implementation of blockchain based list which can be run separately.
- our current implementation - https://github.com/graft-project/GraftNetwork/blob/blockchain_based_list_tests/test_blockchain_based_list.cpp - it has behavior which you have described above (first 10 nodes are elected more often than others);
- Mersenne Twister implementation - https://github.com/graft-project/GraftNetwork/blob/blockchain_based_list_tests/test_mersenne_twister.cpp - fully random, but much slower.
Mersenne Twister provides really uniform distribution but has worse performance compared to blockchain based list building implementation based on block hash indexes.
We don't set the goal to achieve theoretically uniform distribution so for balancing it's fully ok to have first 10 nodes with higher probabilities than other 200+ during selection of nodes to a blockchain based list. Also, in the test we use static list of supernodes for selection (as we understood you did the same). In a real environment for 10M blocks it will be impossible to have static list of supernodes for selection, first of all because we are limiting stake transaction lock time. So we expect randomness will be achieved by stake transaction generation and by block hashes (then also by payment IDs during auth sample building). Also, we are making simulation on top of current blockchain based implementation with real block hashes to find out values of parameters. So their current values are not final.
In one of your previous comments you were absolutely correct that it's no acceptable to have supernodes with zero probability to be selected in a blockchain based list. This was implementation bug which was related to incorrect conversion from signed char to unsigned int.
We are discussing usage of Mersenne Twister implementation instead of current implementation. However, at this time we don't see advantages why it should be used instead of current model.
---
### Comment by @jagerman
First point: I never suggested using `std::uniform_int_distribution`, and in fact you should *not* use it here because it doesn't have C++-standard-guaranteed results. (It also slows things down slightly).
Second point:
> We don't set the goal to achieve theoretically uniform distribution so for balancing it's fully ok to have first 10 nodes with higher probabilities than other 200+ during selection of nodes to a blockchain based list.
is just plain wrong: it is not okay. From the whitepaper:
> Each tier participates in a random selection of 2 sample supernodes.
While a non-uniform sample that probabilistically provides higher rewards to supernodes within a tier that were registered earlier to ones registered later is still, in a technical sense, "random", it is most definitely *not* what most people would assume the whitepaper means by "random."
Third, if your code is running slowly, it's highly unlikely that `std::mt19937_64` (nor `std::mt19937` which you used instead) is the cause:
### r.cpp
```C++
#include <random>
#include <cstdint>
#include <iostream>
#include <chrono>
constexpr size_t ITERS = 100000000;
int main() {
std::mt19937_64 rng;
std::uint64_t x = 0;
auto start = std::chrono::high_resolution_clock::now();
std::uint64_t count = 250;
for (size_t i = 0; i < ITERS; i++)
x += rng() % count;
auto end = std::chrono::high_resolution_clock::now();
auto elapsed_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
uint64_t dps = static_cast<uint64_t>(double(ITERS) / elapsed_us * 1000000);
std::cout << ITERS << " values drawn in " << elapsed_us << "µs = " << dps << " draws per second\n";
std::cout << "\n(meaningless sum of all draws = " << x << ")\n";
}
```
Results:
```
betwixt:~$ g++ -O2 r.cpp -o r
betwixt:~$ ./r
100000000 values drawn in 640173µs = 156207775 draws per second
(meaningless sum of all draws = 12450205566)
```
`std::mt19937_64` is not a performance limitation here.
---
### Comment by @jagerman
> We are discussing usage of Mersenne Twister implementation instead of current implementation. However, at this time we don't see advantages why it should be used instead of current model.
I actually (sort of) agree with this. You should not have any sampling *at all* in graftnoded. The entire sampling process can be done *once* in graft-ng incorporating both the entropy in the current block hash *and* the entropy in the payment id.
---
### Comment by @yidakee
@LenyKholodov - if you don't mind me saying so, please be mindful of wording.
"We don't set the goal to achieve theoretically uniform distribution so for balancing it's fully ok to have first 10 nodes with higher probabilities than other 200+ during selection of nodes to a blockchain based list."
This is the furthest from a fair and evenly distributed network. If think (I hope) what you means is that, currently, an even distribution is not on the top of the list on the development backlog (why not?) but that balancing is what is currently being worked on, and after that we will implement a fair distribution model.
This is 100% of the objective - to achieve an equalitarian Supernode distribution. Otherwise the system can and will be gamed, and adoption will not follow.
---
### Comment by @jagerman
> - our current implementation - https://github.com/graft-project/GraftNetwork/blob/blockchain_based_list_tests/test_blockchain_based_list.cpp - it has behavior which you have described above (first 10 nodes are elected more often than others);
> - Mersenne Twister implementation - https://github.com/graft-project/GraftNetwork/blob/blockchain_based_list_tests/test_mersenne_twister.cpp - fully random, but much slower.
Your "current implementation" selects 32 supernodes out of 250 while you make the Mersenne twister implementation select 255 out of 255 (and in doing so you end up hitting the worse case performance of your implementation algorithm). The result is even apparent in your output: every index is selected with probability of exactly 1.
Here's a proper implementation that fairly compares: https://jagerman.com/test_mersenne_twister.cpp by selecting 32/250 (I also increased the number of experiments back to 100k):
```
Results after 100000 experiments:
f[000]: 12748 0.127480
f[001]: 12852 0.128520
... (many more all 0.127xxx or 0.128xxx -- theoretical ideal is 0.1280000)
f[249]: 12812 0.128120
real 0m0.708s
user 0m0.707s
sys 0m0.000s
```
Here's yours:
```
Results after 100000 experiments:
f[000]: 0.227360
f[001]: 0.246580
f[002]: 0.249790
f[003]: 0.248780
f[004]: 0.248810
f[005]: 0.248990
f[006]: 0.147330
f[007]: 0.130810
f[008]: 0.126130
f[009]: 0.126050
f[010]: 0.125840
f[011]: 0.125440
... (various values between 0.123xxx and 0.126xxx; theoretical ideal is 0.128000)
f[249]: 0.124110
real 0m0.276s
user 0m0.275s
sys 0m0.000s
```
---
### Comment by @LenyKholodov
> @LenyKholodov - if you don't mind me saying so, please be mindful of wording.
>
> "We don't set the goal to achieve theoretically uniform distribution so for balancing it's fully ok to have first 10 nodes with higher probabilities than other 200+ during selection of nodes to a blockchain based list."
>
> This is the furthest from a fair and evenly distributed network. If think (I hope) what you means is that, currently, an even distribution is not on the top of the list on the development backlog (why not?) but that balancing is what is currently being worked on, and after that we will implement a fair distribution model.
>
> This is 100% of the objective - to achieve an equalitarian Supernode distribution. Otherwise the system can and will be gamed, and adoption will not follow.
@yidakee Thank you for your feedback. All tests which are discussing in this thread have assumption that the list supernodes with stake is static during the whole test of thousands of iteration. In practice blockchain based list is built for each block so for example 10k iterations is equal to 10k blocks and it is impossible to have fully static stake supernodes list during 10k blocks. That's why we don't see big issue with non equal probabilities of supernodes for blockchain based list. This is only one of three existing random layers:
1) generation of stakes and list of supernodes with stakes;
2) blockchain based list based on the result of step (1) which is discussed in this PR;
3) auth sample generation based on result of step (2).
---
### Comment by @LenyKholodov
> First point: I never suggested using `std::uniform_int_distribution`, and in fact you should _not_ use it here because it doesn't have C++-standard-guaranteed results. (It also slows things down slightly).
I didn't write that you suggested uniform_int_distribution. However, for the test it is not so important. Any other uniform distribution generator may be used to check probabilities of generated supernodes indexes. So uniform_int_distribution is only a tool.
>
> Second point:
>
> > We don't set the goal to achieve theoretically uniform distribution so for balancing it's fully ok to have first 10 nodes with higher probabilities than other 200+ during selection of nodes to a blockchain based list.
>
> is just plain wrong: it is not okay. From the whitepaper:
>
> > Each tier participates in a random selection of 2 sample supernodes.
>
> While a non-uniform sample that probabilistically provides higher rewards to supernodes within a tier that were registered earlier to ones registered later is still, in a technical sense, "random", it is most definitely _not_ what most people would assume the whitepaper means by "random."
Please keep in mind that we use three layers of randomness:
1) stakes generation;
2) blockchain based list with block hash as a random value;
3) auth sample building with payment ID as a random value.
Also, current implementation provides only a model without configured parameters. We are testing it now and will update with parameters which lead of uniform distribution of auth sample.
>
> Third, if your code is running slowly, it's highly unlikely that `std::mt19937_64` (nor `std::mt19937` which you used instead) is the cause:
>
> ### r.cpp
> ```c++
> #include <random>
> #include <cstdint>
> #include <iostream>
> #include <chrono>
>
> constexpr size_t ITERS = 100000000;
> int main() {
> std::mt19937_64 rng;
> std::uint64_t x = 0;
> auto start = std::chrono::high_resolution_clock::now();
>
> std::uint64_t count = 250;
>
> for (size_t i = 0; i < ITERS; i++)
> x += rng() % count;
>
> auto end = std::chrono::high_resolution_clock::now();
> auto elapsed_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
> uint64_t dps = static_cast<uint64_t>(double(ITERS) / elapsed_us * 1000000);
> std::cout << ITERS << " values drawn in " << elapsed_us << "µs = " << dps << " draws per second\n";
> std::cout << "\n(meaningless sum of all draws = " << x << ")\n";
> }
> ```
>
> Results:
>
> ```
> betwixt:~$ g++ -O2 r.cpp -o r
> betwixt:~$ ./r
> 100000000 values drawn in 640173µs = 156207775 draws per second
>
> (meaningless sum of all draws = 12450205566)
> ```
> `std::mt19937_64` is not a performance limitation here.
Thank you very much for these results. We will check them.
---
### Comment by @LenyKholodov
We checked current blockchain based list implemented and found that it may also be easily modified to achieve uniform distribution requirement. Please find updated source here - https://github.com/graft-project/GraftNetwork/blob/98ab487fdb7482ff6d3792e6c9df6bf0a290ddb5/test_blockchain_based_list.cpp
---
### Comment by @jagerman
> 1. stakes generation
This is not random since people can act to influence it.
> 3. auth sample building with payment ID as a random value.
It is completely irrelevant whether this stage is random or not because the step we are discussing *here* throws away elements from consideration in that stage with non-uniform probability. The fact that you later on randomize among the elements that don't get thrown away does *nothing* to change that: they don't make it to this stage at all. (They should, but you seem to prefer to simply ignore that point).
> We checked current blockchain based list implemented and found that it may also be easily modified to achieve uniform distribution requirement. Please find updated source here - https://github.com/graft-project/GraftNetwork/blob/98ab487fdb7482ff6d3792e6c9df6bf0a290ddb5/test_blockchain_based_list.cpp
It is better, though there is still a significant problem with it that I mentioned earlier: it is not capable of selecting more than 32 supernodes, and worse, once the network hits 257 supernodes on a tier it actually has to *reduce* the work size sample from 32 to 16 supernodes per tier. You can probably fix it, but what's the point when you have a superior solution with known statistical properties right in front of you that *simplifies* your code?
I do not understand your resistance here: `std::mt19937_64` (or even `std::minstd_rand` if you prefer) are well understood algorithms with good performance (a bit better for `std::minstd_rand`), excellent statistic properties (much better for `std::mt19937_64`), are included in the C++ standard, are entirely deterministic for any given seed, do not impose a significant performance cost, result in simpler code, and do not impose any restriction on the number of supernodes that can be selected.
You've thrown up obstacles, you've ignored half of what I've said (most notably why you want randomness at this stage *at all*), and you produced a faulty benchmark to try to prove a technical deficit that doesn't exist.
Please start considering this issue on *technical* grounds rather than emotional ones.
---
### Comment by @LenyKholodov
> > 1. stakes generation
>
> This is not random since people can act to influence it.
>
> > 1. auth sample building with payment ID as a random value.
>
> It is completely irrelevant whether this stage is random or not because the step we are discussing _here_ throws away elements from consideration in that stage with non-uniform probability. The fact that you later on randomize among the elements that don't get thrown away does _nothing_ to change that: they don't make it to this stage at all. (They should, but you seem to prefer to simply ignore that point).
>
> > We checked current blockchain based list implemented and found that it may also be easily modified to achieve uniform distribution requirement. Please find updated source here - https://github.com/graft-project/GraftNetwork/blob/98ab487fdb7482ff6d3792e6c9df6bf0a290ddb5/test_blockchain_based_list.cpp
>
> It is better, though there is still a significant problem with it that I mentioned earlier: it is not capable of selecting more than 32 supernodes, and worse, once the network hits 257 supernodes on a tier it actually has to _reduce_ the work size sample from 32 to 16 supernodes per tier. You can probably fix it, but what's the point when you have a superior solution with known statistical properties right in front of you that _simplifies_ your code?
>
> I do not understand your resistance here: `std::mt19937_64` (or even `std::minstd_rand` if you prefer) are well understood algorithms with good performance (a bit better for `std::minstd_rand`), excellent statistic properties (much better for `std::mt19937_64`), are included in the C++ standard, are entirely deterministic for any given seed, do not impose a significant performance cost, result in simpler code, and do not impose any restriction on the number of supernodes that can be selected.
>
> You've thrown up obstacles, you've ignored half of what I've said (most notably why you want randomness at this stage _at all_), and you produced a faulty benchmark to try to prove a technical deficit that doesn't exist.
>
> Please start considering this issue on _technical_ grounds rather than emotional ones.
@jagerman Thank you very much for your detailed feedback.
> Please start considering this issue on _technical_ grounds rather than emotional ones.
I believe I've been discussing technical issues through the whole discussion without any emotions. If you see any emotions from my side, please forgive me. Emotions is not that I usually use. Current implementation is based on technical vision (https://github.com/graft-project/graft-ng/wiki/%5BRFC-002-SLS%5D-Supernode-List-Selection). We are grateful to you for your vision and proposal and still discussing it internally, but at this time we don't see any advantages of using one of pseudo random implementations. Both algorithms MT and current supernodes selection use same source of entropy - block hash. As you correctly noted original PR had technical issues which led to non uniform distribution of supernodes selection. We are fixing them now.
> You've thrown up obstacles, you've ignored half of what I've said (most notably why you want randomness at this stage _at all_), and you produced a faulty benchmark to try to prove a technical deficit that doesn't exist.
I'm not ignoring what you wrote here. However, at this time the main issue which we're focusing is distribution of blockchain based building. That's why some questions may remain unanswered now.
> why you want randomness at this stage _at all_
We expect to have thousands of valid stake transactions and as a result thousands of active supernodes. We need to select small subset of supernodes which will be potentially used for auth samples during one block. There will be rules about connection management of supernodes in the subset which are not yet described in public. However, the main thing here is that we want to select and fix small subset of supernodes (16-30) for the block. Then this subset will be used as a source for selecting auth sample during the payments based on RTA payment ID as a random source. So for each payment only several nodes from the subset will be used.
> It is better, though there is still a significant problem with it that I mentioned earlier: it is not capable of selecting more than 32 supernodes, and worse, once the network hits 257 supernodes on a tier it actually has to _reduce_ the work size sample from 32 to 16 supernodes per tier.
We don't expect to have more than 32 nodes in a blockchain based list. However, there is no problem to increase it if needed. One of the simplest solution is to use previous block hashes in some combination with current block hash.
> I do not understand your resistance here: `std::mt19937_64` (or even `std::minstd_rand` if you prefer) are well understood algorithms with good performance (a bit better for `std::minstd_rand`), excellent statistic properties (much better for `std::mt19937_64`), are included in the C++ standard, are entirely deterministic for any given seed, do not impose a significant performance cost, result in simpler code, and do not impose any restriction on the number of supernodes that can be selected.
It's very simple. At this time we are implementing and testing solution which is based on previously described technical vision (which I mentioned above in this comment). From our point of view, comparison of random generators may be used only in terms of simplicity and distribution. There are many others well known RNG implementation. However, as I wrote earlier we don't see significant advantages of using them instead of selecting nodes directly based on the entropy source (block hash). At this time we know how to achieve uniform distribution and also current implementation uses same entropy source as may use Meresenne-Twister, ISAAC64, BBS or any other RNG. So from this point of view we don't see advantages to move to another implementation.
---
### Comment by @LenyKholodov
@jagerman After discussion with team of your idea about Mersenne-Twister usage for blockchain based list building we decided to accept it and rework supernodes selection with it. The main advantage of Mersenne-Twister is possibility to select more than 32 supernodes. We don't know now how many nodes we will select in prod environment. However, your approach is more flexible for such selection. Thank you very much again for your efforts.
---
### Comment by @jagerman
> Thank you very much again for your efforts.
I am pleased to hear it and happy to help. My apologies if discussion got a little overheated.
---
### Comment by @yidakee
Way to go team!
---
### Comment by @LenyKholodov
> > Thank you very much again for your efforts.
>
> I am pleased to hear it and happy to help. My apologies if discussion got a little overheated.
No problem. We appreciate your help and participation. It's much better to find issues with implementation on this stage rather than in production.
---
### Review by @jagerman [COMMENTED]
---
### Review by @jagerman [COMMENTED]
---
### Review by @jagerman [COMMENTED]
---
### Review by @jagerman [COMMENTED]
---
### Review by @mbg033 [APPROVED]
---

View file

@ -0,0 +1,277 @@
# Issue #1: Communication options
## Reception Score
| Score | Reason |
|-------|--------|
| **ACTIVE** | Open with discussion |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @bitkis |
| Created | 2019-02-08 |
| Closed | N/A |
| Labels | Discussion |
| Comments | 6 |
---
## Original Post
**Author:** @bitkis
# Communication options
## Current state and motivation
Original P2P network is used for communication between supernodes. Announcements (messages of a special type) are periodically broadcast by every peer and are used for both keeping lists of active peers and building paths (tunnels) between the peers. Such approach induces a high value traffic in the network.
Yet another, less critical, issue is present in the current approach. Even though peers in the original P2P network have discoverable IPs, the complexity of IP discovery is exponential with respect to the number of peers in the network. However, any attempt to build a preferable path between 2 peers makes this complexity linear.
Those issues were raised by *@jagerman* (see [#187](https://github.com/graft-project/graft-ng/issues/187)). The following document lists several approaches we considering, addressing the concerns.
When we first started working on issue, we were mainly focused on _Option 1_ since it would allow us to reduce the amount of traffic without making significant changes to current design. Options 3 and 4 were also under consideration. At the same time we started work on disqualification transactions design -- this mechanism means to be used in any case. Later, however, digging into _Options 3_ and _4_ brought us to _Option 2_, which we believe is the most optimal solution taking into account all practical considerations.
**Publishing this document we would like to hear reaction of the community before making the final decision.**
Since there are still a few open issues, the estimates provided below are preliminary and may be changed if development scope needs to be extended.
## Optimization Options
### P2P broadcast optimization
We can reduce the amount of traffic (both keep-alive and data messages) during P2P broadcasts by
1. Making it random for a peer to re-transmit a message further to the neighbors (same messages will not be re-transmitted by that peer but may be re-transmitted by a neighbor);
2. Making it random for a peer to forward a message further to a particular neighbor (the message will be forwarded to a random subset of the neighbors);
3. Reduce frequency of periodic broadcasts.
Reducing frequency of announcements, we, however, make both peer monitoring and building tunnels less robust.
### Disqualification transactions
Disqualification transaction is a special type of timed transactions in the blockchain, used to prevent a disqualified supernode from being selected to participate in an authorization sample. There are two mechanisms to issue a disqualification transaction:
1. Every (second?) block randomly selected disqualification quorum "pings" a randomly selected supernodes from the set of supernodes with stack transactions in the blockchain and vote for disqualification of dead nodes.
2. After an RTA transaction verification, authorization sample vote for disqualification of a supernode that didn't submit its vote or were late to vote during transaction verification.
Both mechanisms can be used either in conjunction or on their own.
## Development Paths
### Option 1: Keep current design and enhance it
* Current design;
* Optimized tunnel selection;
* P2P broadcast optimization;
* Announcement optimization
* Disqualification transaction mechanism
#### Announcement optimization using Blockchain-based List
1. Each supernode in an authorization sample checks if it's in the next (or few next) blockchain-based list(s). If included, it starts sending periodical announces over the network.
2. While selecting an authorization sample, a supernode compares Blockchain-based list with Announcement List and selects only supernodes from which it receives the announces.
3. Each supernode in an authorization sample checks if its blockchain-based list is active or the supernode is in the next blockchain-based list(s). If the blockchain-based list found inactive and the surernode is not in the next blockchain-based list(s), the supernode stops sending the announcement.
#### Tunnel selection
Currently, to build tunnels, graftnode selects only first three tunnels from announcement list for this supernode. However, at that moment, the list of peer connection can be different from the list which was at the moment of the receiving announce. In the case of increasing time delay between announcements, this situation becomes even more important. To optimize this, graftnode must select only tunnels which have active connections.
#### Pros
* Easy to implement
#### Cons
* Still suboptimal traffic (**not critical**)
* Still linear complexity of IP lookups (**not critical**)
#### Open issues
* Broadcast termination
#### Estimate
~2 weeks (testing included)
### Option 2: Implement Unstructured Distributed Hash Table (DHT)
* Current design;
* No announcements
* P2P broadcast optimization;
* Disqualification transaction mechanism.
1. Upon a supernode joining the network, it retrieves the list of public identification keys from the blockchain (active supernodes), encrypts its IP using keys from a randomly selects subset, and broadcasts the encrypted IP over P2P network.
1. Every few hours the supernode checks the selected supernodes are still active, and reselect inactive nodes. Then it repeats the broadcast procedure, described above.
1. When sending a message, a supernode broadcasts it over P2P network. Broadcast is limited by a maximal number of hops. When the message reaches a node that knows recipient's IP, it's forwarded directly to the recipient.
1. The recipient receives multiple copies of the same message, and should be able to handle this situation gracefully, with no noticeable performance degradation.
![dht-p2p](https://user-images.githubusercontent.com/36085298/52471459-caffa480-2b45-11e9-8503-f21c921d9a81.png)
On the figure above node A sends a message, addressed to node B. Nodes R retransmit the message issued by A. Nodes T terminate the broadcast, assuming 2 hops are allowed. DR nodes know IP of node B.
#### Pros
* Easy to implement
* Almost optimal traffic
* Fast communication between supernodes
#### Cons
* Not quite optimal traffic
#### Open issues
* There are several parameters that need to be selected properly.
* Some math need to be done for proper estimations
#### Estimate
~ 2.5-3.5 weeks (testing included)
### Option 3: Supernode overlay/direct connections
We build a network overlay of supernodes, independent from P2P network. The overlay (or its subset) forms a DHT-like cluster. The DHT cluster can consists of full supernodes only. The DHT stores key-values pairs of supernode public identification keys and IPs. Both requests to join and queries are to be signed by private identification key and validated, upon entering DHT, against public identification key, retrieved from the blockchain. Peers in the supernode overlay communicate directly.
The disqualification transaction mechanism is used in this case as well.
![dht-query](https://user-images.githubusercontent.com/36085298/52471458-caffa480-2b45-11e9-86ec-b51319bcb5e8.png)
On the figure above supernode A, attempting to sends a message to supernode B, queries DHT first.
#### Pros
* Optimal traffic
* Fast communication between supernodes
#### Cons
* All IPs are open to all valid supernodes
* Requires extra development
#### Open issues
* Distributed Hash Table (DHT) selection: Pastry seems to be most attractive right now.
* DHT redundancy (most likely Pastry solves the issue)
* Bootstrapping/entry point
#### Estimate
~3.5 weeks (testing included)
### Option 4: Supernode overlay/Hop over DHT
Again a network overlay of supernodes, independent from P2P network. The overlay forms a DHT-like cluster, where each node knows only small subset of the whole cluster. The DHT stores key-values pairs of supernode public identification keys and IPs. Unlike regular DHT that provides values in response to key-based queries, a sending peer passes a message itself to the DHT cluster. In case a cluster peer knows IP of the message's addressee, it forwards the message to the latter. Otherwise, the peer forwards the message to a known successor, according to the DHT algorithm.
Both requests to join and messages are to be signed by private identification key and validated, upon entering DHT, against public identification key, retrieved from the blockchain.
The DHT cluster can consist of full supernodes only. The number of hops, required for message delivery, does not exceed the number of successors.
![dht-messages](https://user-images.githubusercontent.com/36085298/52471457-caffa480-2b45-11e9-8d5e-f2e013abbe6a.png)
On the figure above supernode A sends a message to supernode B, passing it through DHT nodes.
#### Pros
* Optimal traffic
* Fast communication between supernodes
#### Cons
* Requires extra development
#### Open issues
* Distributed Hash Table (DHT) selection: Pastry seems to be most attractive right now.
* DHT redundancy (most likely Pastry solves the issue)
* Bootstrapping/entry point
#### Estimate
~4.5 weeks (testing included)
---
## Discussion Thread
### Comment by @jagerman
**Date:** 2019-02-08
One question missing from all of this is: *Why?* Specifically, why is hiding supernode IPs particularly advantageous?
When hot wallets on the supernode were part of the design, the incentive for attack was obvious, but now that that has been eliminated, even if someone knows the IP of a supernode, there is little gain to be had from attacking it.
Without such secrecy, a much simpler alternative is:
# Option 5
Upon starting a supernode sends an announcement to the network containing (among other things) the IP and port on which it is reachable. Ordinary nodes synchronize this list with each other. Supernodes communicate directly.
---
### Comment by @bitkis
**Date:** 2019-02-08
> One question missing from all of this is: _Why?_ Specifically, why is hiding supernode IPs particularly advantageous?
To reduce probability of a DOS attack on an RTA auth sample
---
### Comment by @jagerman
**Date:** 2019-02-08
> To reduce probability of a DOS attack on an RTA auth sample
As I understand it, the auth sample is determined on the fly as needed and selected randomly based on a generated random value which can't be predicted; the timespan from when the auth sample is generated to when it is complete is measured in milliseconds.
---
### Comment by @jagerman
**Date:** 2019-02-08
Regarding Option 2: the only way to guarantee that the message from A to B actually reaches B is to make the hop limit equal to the diameter of the network graph. To reuse your example from Option 2, here's the same graph but with some different edges:
![image](https://user-images.githubusercontent.com/4459524/52496327-69712180-2ba9-11e9-9474-910168643d9f.png)
You could increase it to a maximum of three, but then I could draw another counterexample where 3 doesn't work, and so on. I could draw a connected network in your 15 node example where it requires 12 hops to reach any of the DRs (where I include B as a DR).
It seems that, since you have no guarantee at all of how connections are established, the only provably guaranteed value of T that will reach B is a value so is so absurdly large that it will reach every node on the network in the vast majority of cases.
---
### Comment by @bitkis
**Date:** 2019-02-08
> As I understand it, the auth sample is determined on the fly as needed and selected randomly based on a generated random value which can't be predicted; the timespan from when the auth sample is generated to when it is complete is measured in milliseconds.
An auth sample is selected from the list based on the block hash. So, DOS attack on that list can be an issue. Adding disqualification transactions makes such attack even more profitable (you can trigger disqualification of another's supernodes.)
---
### Comment by @bitkis
**Date:** 2019-02-08
> Regarding Option 2 [...]
There are two very relevant parameters here: hop limit and size of randomly selected subset of supernodes. We believe we can find an optimal combination of those. Also, I don't think it makes sense to talk about any guarantees, we rather talk about maximizing probabilities.
---

View file

@ -0,0 +1,42 @@
# Issue #425: Graft RTA Double Spend Attack Vectors and Solutions
## Reception Score
| Score | Reason |
|-------|--------|
| **STALE** | Open with no response |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @mbg033 |
| Created | 2020-05-28 |
| Closed | N/A |
| Labels | |
| Comments | 0 |
---
## Original Post
**Author:** @mbg033
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Attack&nbsp;Vector&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;** | **How to implement attack** | **Solution** | **Comments/Questions** |
| --- | --- | --- | --- |
| **Double Spend with Non-RTA TX (RTA vs non-RTA), classic 51% attack, below is the attack at the different states)** | **two possible scenarios addressed by [Jason](https://graftnetwork.atlassian.net/browse/SUP-51)** | | |
| 1. RTA vs non-RTA tx in mempool | | Prioritize RTA over PoW. Conflicting non-RTA tx should be removed from pool as soon as RTA tx has been added; | |
| 2. RTA tx in mempool vs non-RTA tx in mainchain | Longer chain with double spending TX published to the network right after someone completed RTA TX (signed RTA TX just added to mempool on some node and broadcased to the network) | Rollback: all blocks starting from block containing conflicting TX should be popped from blockchain, returning valid transactions to mempool, conflicting non-RTA transactions removed from mempool | Rollback should be (?) limited by the depth. In case checkpointing implemented - till first checkpoited (unreversible) block; if no checkpointing - N blocks max. N should be some reasonable constant |
| 3. RTA tx in mempool vs non-RTA tx in altchain | | Rollback in alt chain if applicable | Question: check if rollbacks are applicable for alt chains, how it implemented |
| 4. RTA txs in mainchain vs non-RTA txes in altchains | | Rollback (alt chain becames mainchain) until unreversible checkpoint or max possible depth (N) reached | |
| **Double Spend with RTA tx (RTA vs RTA)** | **Can't see how it possible - it needs to be maliciouls auth sample coexisting with true auth sample** | | |
| 1. RTA tx in mempool vs RTA tx in mainchain | | in theory this shouldn't be possible: auth sample supernodes are checking for conflicting key images so such tx will never added to a pool. Only if malicious tx was accepted by malicious auth sample somehow | Question: check if it (how it) possible so we have more than one "valid" auth sample (i.e. one for main chain, another one(s) for alt chain(s), if main chain for one specific node is alt chain for another node |
| 2. RTA txs in mainchain vs RTA txes in altchain | | in theory this shouldn't be possible: auth sample supernodes are checking for conflicting key images so such tx will never added to a pool. Only if malicious tx was accepted by malicious auth sample somehow | |
---
## Discussion Thread

View file

@ -0,0 +1,37 @@
# Issue #341: Jump List Communication: Implement Unstructured Distributed Hash Table
## Reception Score
| Score | Reason |
|-------|--------|
| **STALE** | Open with no response |
---
## Metadata
| Field | Value |
|-------|-------|
| State | OPEN |
| Author | @Dju01 |
| Created | 2019-06-12 |
| Closed | N/A |
| Labels | |
| Comments | 0 |
---
## Original Post
**Author:** @Dju01
Jump List Communication: Implement Unstructured Distributed Hash Table (DHT)
- GNRTA-336
- Message Encryption functions improved:
- https://github.com/graft-project/GraftNetwork/pull/233
- https://github.com/graft-project/GraftNetwork/pull/236
---
## Discussion Thread

View file

@ -0,0 +1,38 @@
# Ledger Papers Archive
Self-documenting folder structure for distributed ledger whitepapers.
```
archive/
├── 00-genesis/ # Pre-Bitcoin: b-money, hashcash, bit gold (1998-2008)
├── 01-cryptonote/ # CryptoNote v2.0 + CNS standards
├── 02-mrl/ # Monero Research Lab (MRL-0001 → MRL-0011)
├── 03-privacy/ # Zcash, Mimblewimble, Lelantus, Spark
├── 04-smart-contracts/ # Ethereum, Solana, Cardano, Polkadot...
├── 05-layer2/ # Lightning, Plasma, Rollups, zkSync
├── 06-consensus/ # PBFT, Tendermint, HotStuff, Casper
├── 07-cryptography/ # Bulletproofs, CLSAG, PLONK, Schnorr
├── 08-defi/ # Uniswap, Aave, Compound, MakerDAO
├── 09-storage/ # IPFS, Filecoin, Arweave, Sia
├── 10-identity/ # DIDs, Verifiable Credentials
├── 11-dag/ # IOTA Tangle, Nano, Fantom Lachesis
├── 12-mev/ # Flashbots, ordering fairness
├── 13-standards-btc/ # BIPs: HD wallets, SegWit, Taproot
├── 14-standards-eth/ # EIPs/ERCs: ERC-20, ERC-721, EIP-1559
├── 15-p2p/ # libp2p, Kademlia, GossipSub, Dandelion++
├── 16-zk-advanced/ # Halo, Nova, Plonky2, STARKs
├── 17-oracles/ # Chainlink, Band Protocol
├── 18-bridges/ # Atomic swaps, XCLAIM, THORChain
├── 19-attacks/ # Security research, attack papers
└── 20-cryptonote-projects/ # Haven, Masari, TurtleCoin, DERO
```
## Stats
- **126 papers** across **21 categories**
- Spanning 1998 → present
- Academic + project documentation
## For the Commons
EUPL-1.2 CIC - papers.lethean.io

View file

@ -0,0 +1,132 @@
#!/usr/bin/env bash
# Discover CryptoNote extension papers
# Usage: ./discover.sh [--all] [--category=NAME] [--project=NAME] [--topic=NAME]
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REGISTRY="$SCRIPT_DIR/registry.json"
# Check for jq
if ! command -v jq &> /dev/null; then
echo "Error: jq is required" >&2
exit 1
fi
CATEGORY=""
PROJECT=""
TOPIC=""
ALL=0
# Parse args
for arg in "$@"; do
case "$arg" in
--all) ALL=1 ;;
--category=*) CATEGORY="${arg#*=}" ;;
--project=*) PROJECT="${arg#*=}" ;;
--topic=*) TOPIC="${arg#*=}" ;;
--search-iacr) SEARCH_IACR=1 ;;
--help|-h)
echo "Usage: $0 [options]"
echo ""
echo "Options:"
echo " --all All known papers"
echo " --category=NAME Filter by category (mrl, iacr, projects, attacks)"
echo " --project=NAME Filter by project (monero, haven, masari, etc)"
echo " --topic=NAME Filter by topic (bulletproofs, ringct, etc)"
echo " --search-iacr Generate IACR search jobs"
echo ""
echo "Categories:"
jq -r '.categories | keys[]' "$REGISTRY"
exit 0
;;
esac
done
echo "# Ledger Papers Archive - $(date +%Y-%m-%d)"
echo "# Format: URL|FILENAME|TYPE|METADATA"
echo "#"
emit_paper() {
local url="$1"
local id="$2"
local category="$3"
local title="$4"
local filename="${id}.pdf"
local metadata="category=$category,title=$title"
echo "${url}|${filename}|paper|${metadata}"
}
# Process categories
process_category() {
local cat_name="$1"
echo "# === $cat_name ==="
# Get papers in category
local papers
papers=$(jq -c ".categories[\"$cat_name\"].papers[]?" "$REGISTRY" 2>/dev/null)
echo "$papers" | while read -r paper; do
[ -z "$paper" ] && continue
local id title url urls
id=$(echo "$paper" | jq -r '.id')
title=$(echo "$paper" | jq -r '.title // "Unknown"')
# Check topic filter
if [ -n "$TOPIC" ]; then
if ! echo "$paper" | jq -e ".topics[]? | select(. == \"$TOPIC\")" > /dev/null 2>&1; then
continue
fi
fi
# Check project filter
if [ -n "$PROJECT" ]; then
local paper_project
paper_project=$(echo "$paper" | jq -r '.project // ""')
if [ "$paper_project" != "$PROJECT" ]; then
continue
fi
fi
# Get URL (single or first from array)
url=$(echo "$paper" | jq -r '.url // .urls[0] // ""')
if [ -n "$url" ]; then
emit_paper "$url" "$id" "$cat_name" "$title"
fi
# Also emit alternate URLs for wayback
urls=$(echo "$paper" | jq -r '.urls[]? // empty' 2>/dev/null)
echo "$urls" | while read -r alt_url; do
[ -z "$alt_url" ] && continue
[ "$alt_url" = "$url" ] && continue
echo "# alt: $alt_url"
done
done
echo "#"
}
# Main logic
if [ "$ALL" = "1" ] || [ -z "$CATEGORY" ]; then
# All categories - dynamically from registry
jq -r '.categories | keys[]' "$REGISTRY" | while read -r cat; do
process_category "$cat"
done
else
# Single category
process_category "$CATEGORY"
fi
# IACR search jobs
if [ "$SEARCH_IACR" = "1" ]; then
echo "# === IACR Search Jobs ==="
jq -r '.search_patterns.iacr[]' "$REGISTRY" | while read -r term; do
encoded=$(echo "$term" | sed 's/ /+/g')
echo "https://eprint.iacr.org/search?q=${encoded}|iacr-search-${encoded}.html|search|source=iacr,term=$term"
done
fi

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,63 @@
# Mining Pool Collector
Archive mining pool statistics, historical hashrate, and block discovery data.
## Data Available
| Data Type | Source | Notes |
|-----------|--------|-------|
| Current hashrate | Pool API | Network stats |
| Block history | Pool API | Found blocks, rewards |
| Miner stats | Pool API | If public |
| Payment history | Pool API | Payout records |
| Pool config | Pool page | Ports, difficulty, fees |
## Known Pools by Coin
### Lethean
- https://lethean.herominers.com/
- https://lthn.pool.gntl.uk/
### Common Pool Software
- nodejs-pool (Snipa)
- cryptonote-universal-pool
- cryptonote-nodejs-pool
- xmr-node-proxy
## Usage
```bash
# Generate jobs for known pools
./generate-jobs.sh lethean > jobs.txt
# Custom pool
./generate-jobs.sh --url=https://pool.example.com --coin=example > jobs.txt
# All known pools
./generate-jobs.sh --all > jobs.txt
```
## Output
```
pool-lethean-herominers/
├── config.json # Pool configuration
├── network.json # Network stats snapshot
├── blocks.json # Found blocks
├── payments.json # Payout history
└── INDEX.md # Summary
```
## Job Format
```
URL|FILENAME|TYPE|METADATA
https://lethean.herominers.com/api/stats|pool-lthn-hero-stats.json|pool-api|coin=lethean,pool=herominers
https://lethean.herominers.com/api/pool/blocks|pool-lthn-hero-blocks.json|pool-api|coin=lethean,pool=herominers
```
## Notes
- Many pools use similar API formats (nodejs-pool standard)
- Historical data often not retained - snapshot what's available
- Pool shutdowns common - archive before they disappear

View file

@ -0,0 +1,105 @@
#!/usr/bin/env bash
# Generate mining pool collection jobs
# Usage: ./generate-jobs.sh <coin> [--url=URL] [--all]
set -e
COIN=""
POOL_URL=""
ALL_POOLS=0
# Known pools registry
declare -A POOLS_LETHEAN=(
["herominers"]="https://lethean.herominers.com"
["gntl"]="https://lthn.pool.gntl.uk"
)
declare -A POOLS_MONERO=(
["supportxmr"]="https://supportxmr.com"
["nanopool"]="https://xmr.nanopool.org"
["hashvault"]="https://monero.hashvault.pro"
)
declare -A POOLS_WOWNERO=(
["herominers"]="https://wownero.herominers.com"
)
# Parse args
for arg in "$@"; do
case "$arg" in
--url=*) POOL_URL="${arg#*=}" ;;
--all) ALL_POOLS=1 ;;
--*) ;;
*) COIN="$arg" ;;
esac
done
emit_pool_jobs() {
local pool_name="$1"
local pool_url="$2"
local coin="$3"
local slug="${coin}-${pool_name}"
echo "# === ${pool_name} (${coin}) ==="
# Common nodejs-pool API endpoints
echo "${pool_url}/api/stats|pool-${slug}-stats.json|pool-api|coin=$coin,pool=$pool_name"
echo "${pool_url}/api/pool/blocks|pool-${slug}-blocks.json|pool-api|coin=$coin,pool=$pool_name"
echo "${pool_url}/api/pool/payments|pool-${slug}-payments.json|pool-api|coin=$coin,pool=$pool_name"
echo "${pool_url}/api/network/stats|pool-${slug}-network.json|pool-api|coin=$coin,pool=$pool_name"
echo "${pool_url}/api/config|pool-${slug}-config.json|pool-api|coin=$coin,pool=$pool_name"
# Web pages
echo "${pool_url}/|pool-${slug}-home.html|pool-web|coin=$coin,pool=$pool_name"
echo "${pool_url}/#/blocks|pool-${slug}-blocks-page.html|pool-web|coin=$coin,pool=$pool_name"
echo "#"
}
echo "# Mining Pool Jobs - $(date +%Y-%m-%d)"
echo "# Format: URL|FILENAME|TYPE|METADATA"
echo "#"
if [ "$ALL_POOLS" = "1" ]; then
for pool in "${!POOLS_LETHEAN[@]}"; do
emit_pool_jobs "$pool" "${POOLS_LETHEAN[$pool]}" "lethean"
done
for pool in "${!POOLS_MONERO[@]}"; do
emit_pool_jobs "$pool" "${POOLS_MONERO[$pool]}" "monero"
done
for pool in "${!POOLS_WOWNERO[@]}"; do
emit_pool_jobs "$pool" "${POOLS_WOWNERO[$pool]}" "wownero"
done
elif [ -n "$POOL_URL" ]; then
pool_name=$(echo "$POOL_URL" | sed 's|.*://||; s|/.*||; s|\..*||')
emit_pool_jobs "$pool_name" "$POOL_URL" "${COIN:-unknown}"
elif [ -n "$COIN" ]; then
case "$COIN" in
lethean|lthn)
for pool in "${!POOLS_LETHEAN[@]}"; do
emit_pool_jobs "$pool" "${POOLS_LETHEAN[$pool]}" "lethean"
done
;;
monero|xmr)
for pool in "${!POOLS_MONERO[@]}"; do
emit_pool_jobs "$pool" "${POOLS_MONERO[$pool]}" "monero"
done
;;
wownero|wow)
for pool in "${!POOLS_WOWNERO[@]}"; do
emit_pool_jobs "$pool" "${POOLS_WOWNERO[$pool]}" "wownero"
done
;;
*)
echo "# Unknown coin: $COIN" >&2
echo "# Use --url= to specify pool URL" >&2
exit 1
;;
esac
else
echo "Usage: $0 <coin> [--url=URL] [--all]" >&2
echo "" >&2
echo "Known coins: lethean, monero, wownero" >&2
exit 1
fi

View file

@ -0,0 +1,126 @@
# Project Archaeology
Deep excavation of abandoned CryptoNote projects before they vanish.
## Purpose
When a CryptoNote project dies, its artifacts scatter:
- GitHub repos get deleted or archived
- BitcoinTalk threads go stale
- Websites go offline
- Block explorers shut down
- Discord servers empty out
This skill orchestrates a **full dig** on a dead project — running all collectors in sequence to preserve everything salvageable before it's gone forever.
## Usage
```bash
# Full excavation of a project
./excavate.sh masari
# Quick scan (just check what's still accessible)
./excavate.sh masari --scan-only
# Specific collectors only
./excavate.sh masari --only=github,bitcointalk
# Resume interrupted dig
./excavate.sh masari --resume
```
## What Gets Collected
| Source | Collector Used | Priority |
|--------|----------------|----------|
| GitHub repos | `github-history` | P1 - often deleted first |
| GitHub releases | `wallet-releases` | P1 - binaries disappear |
| BitcoinTalk ANN | `bitcointalk` | P2 - usually persists |
| Website (Wayback) | `job-collector wayback` | P2 - snapshots exist |
| Block explorer | `block-explorer` | P3 - chain data |
| CoinMarketCap | `coinmarketcap` | P3 - historical prices |
| Whitepapers | `whitepaper-archive` | P1 - research value |
| Reddit | `job-collector reddit` | P4 - community context |
| Medium posts | `job-collector medium` | P4 - announcements |
## Output Structure
```
digs/
└── <project-name>/
├── EXCAVATION.md # Dig log with timestamps
├── SALVAGE-REPORT.md # What's worth keeping
├── LESSONS.md # What killed it, what we learned
├── github/ # All repo history
├── releases/ # Wallet binaries, checksums
├── bitcointalk/ # Thread archive
├── website/ # Wayback snapshots
├── explorer/ # Chain data samples
├── market/ # Price history, volume
├── papers/ # Whitepapers, docs
└── community/ # Reddit, Medium, etc
```
## Report Templates
### SALVAGE-REPORT.md
What code/ideas are worth extracting:
- Unique protocol innovations
- Wallet features
- Mining algorithms
- Community tools
- Documentation patterns
### LESSONS.md
Post-mortem analysis:
- Timeline of decline
- Root causes (dev burnout, drama, funding, tech debt)
- Warning signs to watch for
- What could have saved it
## Integration with cryptonote-discovery
```bash
# Get list of abandoned projects
cd ../cryptonote-discovery
./discover.sh --list-abandoned
# Excavate all abandoned projects (batch mode)
for proj in $(./discover.sh --list-abandoned); do
../project-archaeology/excavate.sh "$proj"
done
```
## Known Dig Sites
Projects confirmed dead/dying that need excavation:
| Project | Symbol | Death Year | Urgency | Notes |
|---------|--------|------------|---------|-------|
| TurtleCoin | TRTL | 2023 | HIGH | Team burned out, great docs |
| Masari | MSR | 2022 | HIGH | Uncle mining code valuable |
| Aeon | AEON | 2021 | MEDIUM | Pruning/lightweight work |
| Nerva | XNV | 2022 | MEDIUM | Anti-pool algo interesting |
| Sumokoin | SUMO | 2021 | LOW | Drama-killed, large ring research |
| Ryo | RYO | 2023 | LOW | GPU algo work |
## Requirements
- All collector skills installed
- `gh` CLI authenticated
- `jq` installed
- Sufficient disk space for archives
- Patience (full dig can take hours)
## Adding New Dig Sites
When you discover a dead CryptoNote project:
1. Add to `../cryptonote-discovery/registry.json`
2. Include `"salvageable": [...]` field
3. Run `./excavate.sh <project> --scan-only` first
4. If sources still accessible, run full dig
---
*"The past is not dead. It's not even past." — but GitHub repos definitely are.*

View file

@ -0,0 +1,149 @@
# Salvage Report: GraftNetwork (GRFT)
**Excavation Date:** 2026-02-01
**Excavator:** Snider + Claude
**Status:** Dead (crypto winter 2020)
---
## Executive Summary
GraftNetwork was a CryptoNote project focused on **real-time point-of-sale payments** using supernodes. They had a working Veriphone terminal app pre-crypto winter. The codebase contains valuable patterns for service node incentives, real-time authorization, and distributed hash tables. HIGH PRIORITY SALVAGE for Lethean's service discovery and payment architecture.
---
## Salvageable Assets
### Code & Algorithms
| Asset | Location | Value | Notes |
|-------|----------|-------|-------|
| RTA (Real-Time Auth) | PR-10, PR-30, PR-221 | **CRITICAL** | Payment authorization protocol |
| Supernode Architecture | PR-10, PR-177 | **CRITICAL** | Service node design |
| Stake Transactions | PR-212, PR-215, PR-303 | **HIGH** | Validator incentives |
| UDHT/DHT Implementation | PR-236, PR-321 | **HIGH** | Decentralized discovery |
| Blockchain-based List | PR-225, PR-258 | **MEDIUM** | On-chain registry |
| Disqualification System | PR-288, PR-325, PR-335 | **HIGH** | Node misbehavior handling |
| RandomX-Graft Variant | PR-366, PR-367 | **MEDIUM** | Mining algo |
| Message Encryption | PR-210, PR-233 | **MEDIUM** | Comms layer |
### Technical Innovations
| Innovation | Description | Lethean Use |
|------------|-------------|-------------|
| **RTA Flow** | Real-time auth for POS payments via supernode network | Exit node payment verification |
| **Auth Sample** | Random supernode selection for transaction validation | Service node selection |
| **Stake Validation** | On-chain proof of node commitment | Service node staking |
| **UDHT2** | Distributed hash table for supernode discovery | Service discovery |
| **Tunnel Data** | PR-156: RTA tunneling for payment routing | VPN session binding |
### Documentation
| Doc | Location | Value |
|-----|----------|-------|
| DAA Description | PR-105 | Difficulty adjustment |
| README updates | Multiple PRs | Build instructions |
---
## Extraction Priority
### P1 - Extract Immediately
- **RTA Protocol** (PR-10, PR-30, PR-221, PR-290)
- Real-time authorization flow
- Maps directly to Lethean payment dispatcher
- Risk: Complex, needs deep read
- **Supernode Architecture** (PR-10, PR-177)
- Wallet integration
- Service registration
- Maps to exit node registration
- **UDHT2** (PR-236, PR-321)
- Decentralized discovery
- Maps to SDP distribution
### P2 - Extract Soon
- **Stake Transactions** (PR-212, PR-215)
- Validator economics
- Lock/unlock patterns
- **Disqualification** (PR-288, PR-325)
- Misbehavior detection
- Slashing patterns
### P3 - Archive When Possible
- **RandomX-Graft** (PR-366, PR-367)
- Mining variant, lower priority
---
## Integration Opportunities
| Asset | Integration Path | Effort | Benefit |
|-------|-----------------|--------|---------|
| RTA Protocol | Adapt for VPN payment flow | HIGH | Real-time session auth |
| Supernode Wallet | Reference for service node wallet | MEDIUM | Staking patterns |
| UDHT2 | Evaluate for SDP distribution | HIGH | Decentralized discovery |
| Auth Sample | Adapt for exit node selection | MEDIUM | Fair selection |
| Disqualification | Model for node reputation | MEDIUM | Network health |
---
## Licensing Notes
| Asset | License | Compatible with EUPL-1.2? |
|-------|---------|---------------------------|
| GraftNetwork | BSD 3-Clause | ✅ Yes |
---
## Key Issues to Review
| Issue | Title | Why Important |
|-------|-------|---------------|
| #76 | Blockchain DAA improvement | Difficulty algo |
| #115 | Modify PoW to prevent hash attacks | Security |
| #208 | Graft under 51% attack | Post-mortem |
| #217 | Subaddresses for stake transactions | Staking patterns |
| #268 | SN auth sample distribution | Selection fairness |
| #269 | Announce broadcasting unreliable | Network reliability |
| #328 | Stake change locked same as stake | Economic design |
---
## Lessons from Death
### What Killed It
- Crypto winter 2020 killed adoption momentum
- POS terminal market timing was too early
- Team resources stretched thin
### What Was Good
- Real working terminal app (Veriphone integration)
- Solid supernode economics
- Clean CryptoNote fork with good PRs
- Active community (graft-community fork)
### Warning Signs
- #347: "Excessive bandwidth usage since 1.9.2"
- #355: "Log flooded with connection timeout"
- Multiple segfault issues late in lifecycle
---
## Action Items
- [ ] Deep-read RTA protocol PRs
- [ ] Extract UDHT2 implementation
- [ ] Compare Graft supernode to Lethean exit node
- [ ] Review stake transaction patterns
- [ ] Check graft-community fork for continued work
- [ ] Document auth sample algorithm
---
*Salvage report generated by project-archaeology*

View file

@ -0,0 +1,311 @@
#!/bin/bash
# Project Archaeology - Deep excavation of abandoned CryptoNote projects
# Usage: ./excavate.sh <project-name> [options]
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SKILLS_DIR="$(dirname "$SCRIPT_DIR")"
REGISTRY="$SKILLS_DIR/cryptonote-discovery/registry.json"
OUTPUT_DIR="$SCRIPT_DIR/digs"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
# Defaults
SCAN_ONLY=false
RESUME=false
ONLY_COLLECTORS=""
usage() {
echo "Usage: $0 <project-name> [options]"
echo ""
echo "Options:"
echo " --scan-only Check what's accessible without downloading"
echo " --resume Resume interrupted excavation"
echo " --only=a,b,c Run specific collectors only"
echo " --help Show this help"
echo ""
echo "Examples:"
echo " $0 masari # Full excavation"
echo " $0 masari --scan-only # Quick accessibility check"
echo " $0 masari --only=github,btt # GitHub and BitcoinTalk only"
exit 1
}
log() {
echo -e "${BLUE}[$(date '+%H:%M:%S')]${NC} $1"
}
success() {
echo -e "${GREEN}[✓]${NC} $1"
}
warn() {
echo -e "${YELLOW}[!]${NC} $1"
}
error() {
echo -e "${RED}[✗]${NC} $1"
}
# Get project data from registry
get_project() {
local name="$1"
jq -r --arg n "$name" '.projects[] | select(.name | ascii_downcase == ($n | ascii_downcase))' "$REGISTRY"
}
# Check if a collector should run
should_run() {
local collector="$1"
if [ -z "$ONLY_COLLECTORS" ]; then
return 0
fi
echo "$ONLY_COLLECTORS" | grep -q "$collector"
}
# Scan a URL to check if accessible
check_url() {
local url="$1"
local status=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$url" 2>/dev/null || echo "000")
if [ "$status" = "200" ] || [ "$status" = "301" ] || [ "$status" = "302" ]; then
return 0
fi
return 1
}
# Main excavation function
excavate() {
local project_name="$1"
local project=$(get_project "$project_name")
if [ -z "$project" ] || [ "$project" = "null" ]; then
error "Project '$project_name' not found in registry"
echo "Add it to: $REGISTRY"
exit 1
fi
# Extract project data
local name=$(echo "$project" | jq -r '.name')
local symbol=$(echo "$project" | jq -r '.symbol')
local status=$(echo "$project" | jq -r '.status')
local github_orgs=$(echo "$project" | jq -r '.github[]?' 2>/dev/null)
local btt_topic=$(echo "$project" | jq -r '.bitcointalk // empty')
local website=$(echo "$project" | jq -r '.website // empty')
local explorer=$(echo "$project" | jq -r '.explorer // empty')
local cmc=$(echo "$project" | jq -r '.cmc // empty')
echo ""
echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
echo -e "${BLUE} PROJECT ARCHAEOLOGY: ${name} (${symbol})${NC}"
echo -e "${BLUE} Status: ${status}${NC}"
echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
echo ""
# Create output directory
local dig_dir="$OUTPUT_DIR/$project_name"
mkdir -p "$dig_dir"/{github,releases,bitcointalk,website,explorer,market,papers,community}
# Start excavation log
local log_file="$dig_dir/EXCAVATION.md"
echo "# Excavation Log: $name ($symbol)" > "$log_file"
echo "" >> "$log_file"
echo "**Started:** $(date)" >> "$log_file"
echo "**Status at dig time:** $status" >> "$log_file"
echo "" >> "$log_file"
echo "---" >> "$log_file"
echo "" >> "$log_file"
# Phase 1: GitHub (highest priority - often deleted first)
if should_run "github"; then
echo "## GitHub Repositories" >> "$log_file"
echo "" >> "$log_file"
for org in $github_orgs; do
log "Checking GitHub org: $org"
if $SCAN_ONLY; then
if check_url "https://github.com/$org"; then
success "GitHub org accessible: $org"
echo "- [x] \`$org\` - accessible" >> "$log_file"
else
warn "GitHub org NOT accessible: $org"
echo "- [ ] \`$org\` - NOT accessible" >> "$log_file"
fi
else
log "Running github-history collector on $org..."
# Would call: $SKILLS_DIR/github-history/collect.sh "https://github.com/$org" --org
echo "- Collected: \`$org\`" >> "$log_file"
fi
done
echo "" >> "$log_file"
fi
# Phase 2: BitcoinTalk
if should_run "btt" || should_run "bitcointalk"; then
echo "## BitcoinTalk Thread" >> "$log_file"
echo "" >> "$log_file"
if [ -n "$btt_topic" ]; then
local btt_url="https://bitcointalk.org/index.php?topic=$btt_topic"
log "Checking BitcoinTalk topic: $btt_topic"
if $SCAN_ONLY; then
if check_url "$btt_url"; then
success "BitcoinTalk thread accessible"
echo "- [x] Topic $btt_topic - accessible" >> "$log_file"
else
warn "BitcoinTalk thread NOT accessible"
echo "- [ ] Topic $btt_topic - NOT accessible" >> "$log_file"
fi
else
log "Running bitcointalk collector..."
# Would call: $SKILLS_DIR/bitcointalk/collect.sh "$btt_topic"
echo "- Collected: Topic $btt_topic" >> "$log_file"
fi
else
warn "No BitcoinTalk topic ID in registry"
echo "- [ ] No topic ID recorded" >> "$log_file"
fi
echo "" >> "$log_file"
fi
# Phase 3: Website via Wayback
if should_run "wayback" || should_run "website"; then
echo "## Website (Wayback Machine)" >> "$log_file"
echo "" >> "$log_file"
if [ -n "$website" ]; then
log "Checking Wayback Machine for: $website"
local wayback_api="https://archive.org/wayback/available?url=$website"
if $SCAN_ONLY; then
local wayback_check=$(curl -s "$wayback_api" | jq -r '.archived_snapshots.closest.available // "false"')
if [ "$wayback_check" = "true" ]; then
success "Wayback snapshots available for $website"
echo "- [x] \`$website\` - snapshots available" >> "$log_file"
else
warn "No Wayback snapshots for $website"
echo "- [ ] \`$website\` - no snapshots" >> "$log_file"
fi
else
log "Running wayback collector..."
# Would call: $SKILLS_DIR/job-collector/generate-jobs.sh wayback "$website"
echo "- Collected: \`$website\`" >> "$log_file"
fi
else
warn "No website in registry"
echo "- [ ] No website recorded" >> "$log_file"
fi
echo "" >> "$log_file"
fi
# Phase 4: Block Explorer
if should_run "explorer"; then
echo "## Block Explorer" >> "$log_file"
echo "" >> "$log_file"
if [ -n "$explorer" ]; then
log "Checking block explorer: $explorer"
if $SCAN_ONLY; then
if check_url "https://$explorer"; then
success "Block explorer online: $explorer"
echo "- [x] \`$explorer\` - online" >> "$log_file"
else
warn "Block explorer OFFLINE: $explorer"
echo "- [ ] \`$explorer\` - OFFLINE" >> "$log_file"
fi
else
log "Running block-explorer collector..."
echo "- Collected: \`$explorer\`" >> "$log_file"
fi
else
warn "No explorer in registry"
echo "- [ ] No explorer recorded" >> "$log_file"
fi
echo "" >> "$log_file"
fi
# Phase 5: Market Data (CMC)
if should_run "cmc" || should_run "market"; then
echo "## Market Data" >> "$log_file"
echo "" >> "$log_file"
if [ -n "$cmc" ]; then
log "Checking CoinMarketCap: $cmc"
if $SCAN_ONLY; then
if check_url "https://coinmarketcap.com/currencies/$cmc/"; then
success "CMC page exists: $cmc"
echo "- [x] CMC: \`$cmc\` - exists" >> "$log_file"
else
warn "CMC page NOT found: $cmc"
echo "- [ ] CMC: \`$cmc\` - not found" >> "$log_file"
fi
else
log "Running coinmarketcap collector..."
echo "- Collected: \`$cmc\`" >> "$log_file"
fi
else
warn "No CMC slug in registry"
echo "- [ ] No CMC slug recorded" >> "$log_file"
fi
echo "" >> "$log_file"
fi
# Finalize log
echo "---" >> "$log_file"
echo "" >> "$log_file"
echo "**Completed:** $(date)" >> "$log_file"
if $SCAN_ONLY; then
echo ""
success "Scan complete. See: $log_file"
else
echo ""
success "Excavation complete. Output in: $dig_dir"
echo ""
log "Next steps:"
echo " 1. Review: $log_file"
echo " 2. Generate: $dig_dir/SALVAGE-REPORT.md"
echo " 3. Write: $dig_dir/LESSONS.md"
fi
}
# Parse arguments
if [ $# -lt 1 ]; then
usage
fi
PROJECT="$1"
shift
while [ $# -gt 0 ]; do
case "$1" in
--scan-only)
SCAN_ONLY=true
;;
--resume)
RESUME=true
;;
--only=*)
ONLY_COLLECTORS="${1#*=}"
;;
--help)
usage
;;
*)
error "Unknown option: $1"
usage
;;
esac
shift
done
# Run excavation
excavate "$PROJECT"

View file

@ -0,0 +1,100 @@
# Lessons Learned: {{PROJECT_NAME}} ({{SYMBOL}})
**Excavation Date:** {{DATE}}
**Post-Mortem By:** {{EXCAVATOR}}
---
## Project Timeline
| Date | Event |
|------|-------|
| {{GENESIS}} | Genesis block |
| | |
| | |
| {{DEATH_YEAR}} | Project effectively dead |
---
## What Killed It?
### Primary Cause
> The main reason this project failed
### Contributing Factors
-
-
-
### The Final Straw
> What was the last event before abandonment?
---
## Warning Signs We Saw
Signs that appeared before death (in order):
1.
2.
3.
---
## What Could Have Saved It?
| Problem | Potential Solution | Why It Didn't Happen |
|---------|-------------------|---------------------|
| | | |
---
## Patterns to Watch For
Red flags that Lethean should monitor in itself:
- [ ]
- [ ]
- [ ]
---
## What They Did Right
Not everything was a failure. Worth preserving:
-
-
-
---
## Community Sentiment
### At Peak
> How did the community feel when things were good?
### At Decline
> How did sentiment shift?
### At Death
> Final community state
---
## Quotes Worth Remembering
> "Quote from team or community"
> — Source, Date
---
## Key Takeaways for Lethean
1.
2.
3.
---
*Post-mortem generated by project-archaeology*

View file

@ -0,0 +1,88 @@
# Salvage Report: {{PROJECT_NAME}} ({{SYMBOL}})
**Excavation Date:** {{DATE}}
**Excavator:** {{EXCAVATOR}}
**Status:** {{STATUS}}
---
## Executive Summary
> One paragraph: What was this project, what's worth saving, priority level.
---
## Salvageable Assets
### Code & Algorithms
| Asset | Location | Value | Notes |
|-------|----------|-------|-------|
| | | | |
### Documentation
| Doc | Location | Value | Notes |
|-----|----------|-------|-------|
| | | | |
### Community Tools
| Tool | Location | Value | Notes |
|------|----------|-------|-------|
| | | | |
### Design Assets
| Asset | Location | Value | Notes |
|-------|----------|-------|-------|
| | | | |
---
## Extraction Priority
### P1 - Extract Immediately
> Risk of disappearing, high value
-
### P2 - Extract Soon
> Stable for now, good value
-
### P3 - Archive When Possible
> Low urgency, reference value
-
---
## Integration Opportunities
How these assets could benefit Lethean:
| Asset | Integration Path | Effort | Benefit |
|-------|-----------------|--------|---------|
| | | | |
---
## Licensing Notes
| Asset | License | Compatible with EUPL-1.2? |
|-------|---------|---------------------------|
| | | |
---
## Action Items
- [ ]
- [ ]
- [ ]
---
*Salvage report generated by project-archaeology*

View file

@ -0,0 +1,60 @@
# Wallet Releases Collector
Archive wallet software releases, changelogs, and binary checksums.
## Data Available
| Data Type | Source | Notes |
|-----------|--------|-------|
| Release binaries | GitHub releases | Preserve before deletion |
| Changelogs | Release notes | Feature history |
| Checksums | Release page | Verify integrity |
| Source tags | Git tags | Build from source |
## Usage
```bash
# Collect all releases for a project
./generate-jobs.sh LetheanNetwork/lethean > jobs.txt
# Just metadata (no binaries)
./generate-jobs.sh LetheanNetwork/lethean --metadata-only > jobs.txt
# Include pre-releases
./generate-jobs.sh LetheanNetwork/lethean --include-prereleases > jobs.txt
```
## Output
```
releases-lethean/
├── v5.0.0/
│ ├── release.json # GitHub API response
│ ├── CHANGELOG.md # Release notes
│ ├── checksums.txt # SHA256 of binaries
│ └── assets.json # Binary URLs (not downloaded)
├── v4.0.1/
│ └── ...
└── INDEX.md # Version timeline
```
## Job Format
```
URL|FILENAME|TYPE|METADATA
https://api.github.com/repos/LetheanNetwork/lethean/releases|releases-lethean-all.json|github-api|project=lethean
https://github.com/LetheanNetwork/lethean/releases/tag/v5.0.0|releases-lethean-v5.0.0.html|github-web|project=lethean,version=v5.0.0
```
## Preservation Priority
1. **Critical**: Changelogs, checksums, version numbers
2. **Important**: Release dates, asset lists, download counts
3. **Optional**: Binary downloads (large, reproducible from source)
## Notes
- Abandoned projects often delete releases first
- GitHub API rate limited - use authenticated requests
- Some projects use different release platforms (SourceForge, own CDN)
- Track gpg signature files when available

View file

@ -0,0 +1,81 @@
# Whitepaper Archive Collector
Preserve whitepapers, technical documentation, and foundational documents from crypto projects.
## Data Available
| Data Type | Source | Notes |
|-----------|--------|-------|
| Original whitepaper | Project site | PDF/HTML |
| Technical docs | GitHub wiki | Architecture details |
| Protocol specs | Docs site | Often disappear |
| Academic papers | arxiv, iacr | CryptoNote foundations |
## Known Sources
### CryptoNote Foundation
- Original CryptoNote whitepaper (van Saberhagen)
- Ring signature paper
- Stealth address paper
### Per-Project
- Monero Research Lab papers
- Haven Protocol whitepaper
- Lethean whitepaper
### Academic
- arxiv.org crypto papers
- iacr.org cryptography
## Usage
```bash
# Collect known whitepapers for a project
./generate-jobs.sh lethean > jobs.txt
# All CryptoNote foundational papers
./generate-jobs.sh --foundation > jobs.txt
# Research papers by topic
./generate-jobs.sh --topic=ring-signatures > jobs.txt
```
## Output
```
whitepapers/
├── cryptonote/
│ ├── cryptonote-v2.pdf
│ ├── ring-signatures.pdf
│ └── stealth-addresses.pdf
├── lethean/
│ ├── whitepaper-v1.pdf
│ └── technical-overview.md
└── INDEX.md
```
## Job Format
```
URL|FILENAME|TYPE|METADATA
https://cryptonote.org/whitepaper.pdf|cryptonote-v2.pdf|whitepaper|project=cryptonote,version=2
```
## Known URLs
### CryptoNote Original
- https://cryptonote.org/whitepaper.pdf (may be down)
- Archive.org backup needed
### Monero Research Lab
- https://www.getmonero.org/resources/research-lab/
### Academic
- https://eprint.iacr.org/ (IACR ePrint)
- https://arxiv.org/list/cs.CR/recent
## Notes
- Many original sites are gone - use Wayback Machine
- PDFs should be archived with multiple checksums
- Track version history when multiple revisions exist

41
claude/README.md Normal file
View file

@ -0,0 +1,41 @@
# core-claude
Claude Code plugin for the Host UK federated monorepo.
## Installation
```bash
/plugin marketplace add host-uk/core-claude
/plugin install core@core-claude
```
## Features
### Skills
- **core** - Core CLI command reference for multi-repo management
- **core-php** - PHP module patterns for Laravel packages
- **core-go** - Go package patterns for the CLI
### Commands
- `/core:remember <fact>` - Save context facts that persist across compaction
### Hooks
**Safety hooks:**
- Blocks destructive commands (`rm -rf`, `sed -i`, mass operations)
- Enforces `core` CLI over raw `go`/`php` commands
- Prevents random .md file creation
**Context preservation:**
- Saves state before auto-compact (prevents "amnesia")
- Restores recent session context on startup
- Extracts actionables from tool output
**Auto-formatting:**
- PHP files via Pint after edits
- Go files via gofmt after edits
- Warns about debug statements
## Dependencies
- [superpowers](https://github.com/anthropics/claude-plugins-official) from claude-plugins-official

View file

@ -0,0 +1,36 @@
---
name: remember
description: Save a fact or decision to context for persistence across compacts
args: <fact to remember>
---
# Remember Context
Save the provided fact to `~/.claude/sessions/context.json`.
## Usage
```
/core:remember Use Action pattern not Service
/core:remember User prefers UK English
/core:remember RFC: minimal state in pre-compact hook
```
## Action
Run this command to save the fact:
```bash
~/.claude/plugins/cache/core/scripts/capture-context.sh "<fact>" "user"
```
Or if running from the plugin directory:
```bash
"${CLAUDE_PLUGIN_ROOT}/scripts/capture-context.sh" "<fact>" "user"
```
The fact will be:
- Stored in context.json (max 20 items)
- Included in pre-compact snapshots
- Auto-cleared after 3 hours of inactivity

83
claude/hooks/hooks.json Normal file
View file

@ -0,0 +1,83 @@
{
"$schema": "https://claude.ai/schemas/hooks.json",
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "bash ./hooks/prefer-core.sh"
}
],
"description": "Block destructive commands (rm -rf, sed -i, xargs rm) and enforce core CLI"
}
],
"PostToolUse": [
{
"matcher": "tool == \"Edit\" && tool_input.file_path matches \"\\.go$\"",
"hooks": [
{
"type": "command",
"command": "bash ./scripts/go-format.sh"
}
],
"description": "Auto-format Go files after edits"
},
{
"matcher": "tool == \"Edit\" && tool_input.file_path matches \"\\.php$\"",
"hooks": [
{
"type": "command",
"command": "bash ./scripts/php-format.sh"
}
],
"description": "Auto-format PHP files after edits"
},
{
"matcher": "tool == \"Edit\"",
"hooks": [
{
"type": "command",
"command": "bash ./scripts/check-debug.sh"
}
],
"description": "Warn about debug statements (dd, dump, fmt.Println)"
},
{
"matcher": "tool == \"Bash\" && tool_input.command matches \"^git commit\"",
"hooks": [
{
"type": "command",
"command": "bash ./scripts/post-commit-check.sh"
}
],
"description": "Warn about uncommitted work after git commit"
}
],
"PreCompact": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "bash ./scripts/pre-compact.sh"
}
],
"description": "Save state before auto-compact to prevent amnesia"
}
],
"SessionStart": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "bash ./scripts/session-start.sh"
}
],
"description": "Restore recent session context on startup"
}
]
}
}

102
claude/hooks/prefer-core.sh Executable file
View file

@ -0,0 +1,102 @@
#!/bin/bash
# PreToolUse hook: Block dangerous commands, enforce core CLI
#
# BLOCKS:
# - Raw go commands (use core go *)
# - Destructive grep patterns (sed -i, xargs rm, etc.)
# - Mass file operations (rm -rf, mv/cp with wildcards)
# - Any sed outside of safe patterns
#
# This prevents "efficient shortcuts" that nuke codebases
read -r input
command=$(echo "$input" | jq -r '.tool_input.command // empty')
# === HARD BLOCKS - Never allow these ===
# Block rm -rf, rm -r (except for known safe paths like node_modules, vendor, .cache)
if echo "$command" | grep -qE 'rm\s+(-[a-zA-Z]*r[a-zA-Z]*|-[a-zA-Z]*f[a-zA-Z]*r|--recursive)'; then
# Allow only specific safe directories
if ! echo "$command" | grep -qE 'rm\s+(-rf|-r)\s+(node_modules|vendor|\.cache|dist|build|__pycache__|\.pytest_cache|/tmp/)'; then
echo '{"decision": "block", "message": "BLOCKED: Recursive delete is not allowed. Delete files individually or ask the user to run this command."}'
exit 0
fi
fi
# Block mv/cp with wildcards (mass file moves)
if echo "$command" | grep -qE '(mv|cp)\s+.*\*'; then
echo '{"decision": "block", "message": "BLOCKED: Mass file move/copy with wildcards is not allowed. Move files individually."}'
exit 0
fi
# Block xargs with rm, mv, cp (mass operations)
if echo "$command" | grep -qE 'xargs\s+.*(rm|mv|cp)'; then
echo '{"decision": "block", "message": "BLOCKED: xargs with file operations is not allowed. Too risky for mass changes."}'
exit 0
fi
# Block find -exec with rm, mv, cp
if echo "$command" | grep -qE 'find\s+.*-exec\s+.*(rm|mv|cp)'; then
echo '{"decision": "block", "message": "BLOCKED: find -exec with file operations is not allowed. Too risky for mass changes."}'
exit 0
fi
# Block ALL sed -i (in-place editing)
if echo "$command" | grep -qE 'sed\s+(-[a-zA-Z]*i|--in-place)'; then
echo '{"decision": "block", "message": "BLOCKED: sed -i (in-place edit) is never allowed. Use the Edit tool for file changes."}'
exit 0
fi
# Block sed piped to file operations
if echo "$command" | grep -qE 'sed.*\|.*tee|sed.*>'; then
echo '{"decision": "block", "message": "BLOCKED: sed with file output is not allowed. Use the Edit tool for file changes."}'
exit 0
fi
# Block grep with -l piped to xargs/rm/sed (the classic codebase nuke pattern)
if echo "$command" | grep -qE 'grep\s+.*-l.*\|'; then
echo '{"decision": "block", "message": "BLOCKED: grep -l piped to other commands is the classic codebase nuke pattern. Not allowed."}'
exit 0
fi
# Block perl -i, awk with file redirection (sed alternatives)
if echo "$command" | grep -qE 'perl\s+-[a-zA-Z]*i|awk.*>'; then
echo '{"decision": "block", "message": "BLOCKED: In-place file editing with perl/awk is not allowed. Use the Edit tool."}'
exit 0
fi
# === REQUIRE CORE CLI ===
# Block raw go commands
case "$command" in
"go test"*|"go build"*|"go fmt"*|"go mod tidy"*|"go vet"*|"go run"*)
echo '{"decision": "block", "message": "Use `core go test`, `core build`, `core go fmt --fix`, etc. Raw go commands are not allowed."}'
exit 0
;;
"go "*)
# Other go commands - warn but allow
echo '{"decision": "block", "message": "Prefer `core go *` commands. If core does not have this command, ask the user."}'
exit 0
;;
esac
# Block raw php commands
case "$command" in
"php artisan serve"*|"./vendor/bin/pest"*|"./vendor/bin/pint"*|"./vendor/bin/phpstan"*)
echo '{"decision": "block", "message": "Use `core php dev`, `core php test`, `core php fmt`, `core php analyse`. Raw php commands are not allowed."}'
exit 0
;;
"composer test"*|"composer lint"*)
echo '{"decision": "block", "message": "Use `core php test` or `core php fmt`. Raw composer commands are not allowed."}'
exit 0
;;
esac
# Block golangci-lint directly
if echo "$command" | grep -qE '^golangci-lint'; then
echo '{"decision": "block", "message": "Use `core go lint` instead of golangci-lint directly."}'
exit 0
fi
# === APPROVED ===
echo '{"decision": "approve"}'

27
claude/scripts/block-docs.sh Executable file
View file

@ -0,0 +1,27 @@
#!/bin/bash
# Block creation of random .md files - keeps docs consolidated
read -r input
FILE_PATH=$(echo "$input" | jq -r '.tool_input.file_path // empty')
if [[ -n "$FILE_PATH" ]]; then
# Allow known documentation files
case "$FILE_PATH" in
*README.md|*CLAUDE.md|*AGENTS.md|*CONTRIBUTING.md|*CHANGELOG.md|*LICENSE.md)
echo "$input"
exit 0
;;
# Allow docs/ directory
*/docs/*.md|*/docs/**/*.md)
echo "$input"
exit 0
;;
# Block other .md files
*.md)
echo '{"decision": "block", "message": "Use README.md or docs/ for documentation. Random .md files clutter the repo."}'
exit 0
;;
esac
fi
echo "$input"

View file

@ -0,0 +1,44 @@
#!/bin/bash
# Capture context facts from tool output or conversation
# Called by PostToolUse hooks to extract actionable items
#
# Stores in ~/.claude/sessions/context.json as:
# [{"fact": "...", "source": "core go qa", "ts": 1234567890}, ...]
CONTEXT_FILE="${HOME}/.claude/sessions/context.json"
TIMESTAMP=$(date '+%s')
THREE_HOURS=10800
mkdir -p "${HOME}/.claude/sessions"
# Initialize if missing or stale
if [[ -f "$CONTEXT_FILE" ]]; then
FIRST_TS=$(jq -r '.[0].ts // 0' "$CONTEXT_FILE" 2>/dev/null)
NOW=$(date '+%s')
AGE=$((NOW - FIRST_TS))
if [[ $AGE -gt $THREE_HOURS ]]; then
echo "[]" > "$CONTEXT_FILE"
fi
else
echo "[]" > "$CONTEXT_FILE"
fi
# Read input (fact and source passed as args or stdin)
FACT="${1:-}"
SOURCE="${2:-manual}"
if [[ -z "$FACT" ]]; then
# Try reading from stdin
read -r FACT
fi
if [[ -n "$FACT" ]]; then
# Append to context (keep last 20 items)
jq --arg fact "$FACT" --arg source "$SOURCE" --argjson ts "$TIMESTAMP" \
'. + [{"fact": $fact, "source": $source, "ts": $ts}] | .[-20:]' \
"$CONTEXT_FILE" > "${CONTEXT_FILE}.tmp" && mv "${CONTEXT_FILE}.tmp" "$CONTEXT_FILE"
echo "[Context] Saved: $FACT" >&2
fi
exit 0

27
claude/scripts/check-debug.sh Executable file
View file

@ -0,0 +1,27 @@
#!/bin/bash
# Warn about debug statements left in code after edits
read -r input
FILE_PATH=$(echo "$input" | jq -r '.tool_input.file_path // empty')
if [[ -n "$FILE_PATH" && -f "$FILE_PATH" ]]; then
case "$FILE_PATH" in
*.go)
# Check for fmt.Println, log.Println debug statements
if grep -n "fmt\.Println\|log\.Println" "$FILE_PATH" 2>/dev/null | head -3 | grep -q .; then
echo "[Hook] WARNING: Debug prints found in $FILE_PATH" >&2
grep -n "fmt\.Println\|log\.Println" "$FILE_PATH" 2>/dev/null | head -3 >&2
fi
;;
*.php)
# Check for dd(), dump(), var_dump(), print_r()
if grep -n "dd(\|dump(\|var_dump(\|print_r(" "$FILE_PATH" 2>/dev/null | head -3 | grep -q .; then
echo "[Hook] WARNING: Debug statements found in $FILE_PATH" >&2
grep -n "dd(\|dump(\|var_dump(\|print_r(" "$FILE_PATH" 2>/dev/null | head -3 >&2
fi
;;
esac
fi
# Pass through the input
echo "$input"

View file

@ -0,0 +1,34 @@
#!/bin/bash
# Extract actionable items from core CLI output
# Called PostToolUse on Bash commands that run core
read -r input
COMMAND=$(echo "$input" | jq -r '.tool_input.command // empty')
OUTPUT=$(echo "$input" | jq -r '.tool_output.output // empty')
CONTEXT_SCRIPT="$(dirname "$0")/capture-context.sh"
# Extract actionables from specific core commands
case "$COMMAND" in
"core go qa"*|"core go test"*|"core go lint"*)
# Extract error/warning lines
echo "$OUTPUT" | grep -E "^(ERROR|WARN|FAIL|---)" | head -5 | while read -r line; do
"$CONTEXT_SCRIPT" "$line" "core go"
done
;;
"core php test"*|"core php analyse"*)
# Extract PHP errors
echo "$OUTPUT" | grep -E "^(FAIL|Error|×)" | head -5 | while read -r line; do
"$CONTEXT_SCRIPT" "$line" "core php"
done
;;
"core build"*)
# Extract build errors
echo "$OUTPUT" | grep -E "^(error|cannot|undefined)" | head -5 | while read -r line; do
"$CONTEXT_SCRIPT" "$line" "core build"
done
;;
esac
# Pass through
echo "$input"

19
claude/scripts/go-format.sh Executable file
View file

@ -0,0 +1,19 @@
#!/bin/bash
# Auto-format Go files after edits using core go fmt
read -r input
FILE_PATH=$(echo "$input" | jq -r '.tool_input.file_path // empty')
if [[ -n "$FILE_PATH" && -f "$FILE_PATH" ]]; then
# Run gofmt/goimports on the file silently
if command -v core &> /dev/null; then
core go fmt --fix "$FILE_PATH" 2>/dev/null || true
elif command -v goimports &> /dev/null; then
goimports -w "$FILE_PATH" 2>/dev/null || true
elif command -v gofmt &> /dev/null; then
gofmt -w "$FILE_PATH" 2>/dev/null || true
fi
fi
# Pass through the input
echo "$input"

17
claude/scripts/php-format.sh Executable file
View file

@ -0,0 +1,17 @@
#!/bin/bash
# Auto-format PHP files after edits using core php fmt
read -r input
FILE_PATH=$(echo "$input" | jq -r '.tool_input.file_path // empty')
if [[ -n "$FILE_PATH" && -f "$FILE_PATH" ]]; then
# Run Pint on the file silently
if command -v core &> /dev/null; then
core php fmt --fix "$FILE_PATH" 2>/dev/null || true
elif [[ -f "./vendor/bin/pint" ]]; then
./vendor/bin/pint "$FILE_PATH" 2>/dev/null || true
fi
fi
# Pass through the input
echo "$input"

View file

@ -0,0 +1,51 @@
#!/bin/bash
# Post-commit hook: Check for uncommitted work that might get lost
#
# After committing task-specific files, check if there's other work
# in the repo that should be committed or stashed
read -r input
COMMAND=$(echo "$input" | jq -r '.tool_input.command // empty')
# Only run after git commit
if ! echo "$COMMAND" | grep -qE '^git commit'; then
echo "$input"
exit 0
fi
# Check for remaining uncommitted changes
UNSTAGED=$(git diff --name-only 2>/dev/null | wc -l | tr -d ' ')
STAGED=$(git diff --cached --name-only 2>/dev/null | wc -l | tr -d ' ')
UNTRACKED=$(git ls-files --others --exclude-standard 2>/dev/null | wc -l | tr -d ' ')
TOTAL=$((UNSTAGED + STAGED + UNTRACKED))
if [[ $TOTAL -gt 0 ]]; then
echo "" >&2
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" >&2
echo "[PostCommit] WARNING: Uncommitted work remains" >&2
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" >&2
if [[ $UNSTAGED -gt 0 ]]; then
echo " Modified (unstaged): $UNSTAGED files" >&2
git diff --name-only 2>/dev/null | head -5 | sed 's/^/ /' >&2
[[ $UNSTAGED -gt 5 ]] && echo " ... and $((UNSTAGED - 5)) more" >&2
fi
if [[ $STAGED -gt 0 ]]; then
echo " Staged (not committed): $STAGED files" >&2
git diff --cached --name-only 2>/dev/null | head -5 | sed 's/^/ /' >&2
fi
if [[ $UNTRACKED -gt 0 ]]; then
echo " Untracked: $UNTRACKED files" >&2
git ls-files --others --exclude-standard 2>/dev/null | head -5 | sed 's/^/ /' >&2
[[ $UNTRACKED -gt 5 ]] && echo " ... and $((UNTRACKED - 5)) more" >&2
fi
echo "" >&2
echo "Consider: commit these, stash them, or confirm they're intentionally left" >&2
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" >&2
fi
echo "$input"

18
claude/scripts/pr-created.sh Executable file
View file

@ -0,0 +1,18 @@
#!/bin/bash
# Log PR URL and provide review command after PR creation
read -r input
COMMAND=$(echo "$input" | jq -r '.tool_input.command // empty')
OUTPUT=$(echo "$input" | jq -r '.tool_output.output // empty')
if [[ "$COMMAND" == *"gh pr create"* ]]; then
PR_URL=$(echo "$OUTPUT" | grep -oE 'https://github.com/[^/]+/[^/]+/pull/[0-9]+' | head -1)
if [[ -n "$PR_URL" ]]; then
REPO=$(echo "$PR_URL" | sed -E 's|https://github.com/([^/]+/[^/]+)/pull/[0-9]+|\1|')
PR_NUM=$(echo "$PR_URL" | sed -E 's|.*/pull/([0-9]+)|\1|')
echo "[Hook] PR created: $PR_URL" >&2
echo "[Hook] To review: gh pr review $PR_NUM --repo $REPO" >&2
fi
fi
echo "$input"

69
claude/scripts/pre-compact.sh Executable file
View file

@ -0,0 +1,69 @@
#!/bin/bash
# Pre-compact: Save minimal state for Claude to resume after auto-compact
#
# Captures:
# - Working directory + branch
# - Git status (files touched)
# - Todo state (in_progress items)
# - Context facts (decisions, actionables)
STATE_FILE="${HOME}/.claude/sessions/scratchpad.md"
CONTEXT_FILE="${HOME}/.claude/sessions/context.json"
TIMESTAMP=$(date '+%s')
CWD=$(pwd)
mkdir -p "${HOME}/.claude/sessions"
# Get todo state
TODOS=""
if [[ -f "${HOME}/.claude/todos/current.json" ]]; then
TODOS=$(cat "${HOME}/.claude/todos/current.json" 2>/dev/null | head -50)
fi
# Get git status
GIT_STATUS=""
BRANCH=""
if git rev-parse --git-dir > /dev/null 2>&1; then
GIT_STATUS=$(git status --short 2>/dev/null | head -15)
BRANCH=$(git branch --show-current 2>/dev/null)
fi
# Get context facts
CONTEXT=""
if [[ -f "$CONTEXT_FILE" ]]; then
CONTEXT=$(jq -r '.[] | "- [\(.source)] \(.fact)"' "$CONTEXT_FILE" 2>/dev/null | tail -10)
fi
cat > "$STATE_FILE" << EOF
---
timestamp: ${TIMESTAMP}
cwd: ${CWD}
branch: ${BRANCH:-none}
---
# Resume After Compact
You were mid-task. Do NOT assume work is complete.
## Project
\`${CWD}\` on \`${BRANCH:-no branch}\`
## Files Changed
\`\`\`
${GIT_STATUS:-none}
\`\`\`
## Todos (in_progress = NOT done)
\`\`\`json
${TODOS:-check /todos}
\`\`\`
## Context (decisions & actionables)
${CONTEXT:-none captured}
## Next
Continue the in_progress todo.
EOF
echo "[PreCompact] Snapshot saved" >&2
exit 0

34
claude/scripts/session-start.sh Executable file
View file

@ -0,0 +1,34 @@
#!/bin/bash
# Session start: Read scratchpad if recent, otherwise start fresh
# 3 hour window - if older, you've moved on mentally
STATE_FILE="${HOME}/.claude/sessions/scratchpad.md"
THREE_HOURS=10800 # seconds
if [[ -f "$STATE_FILE" ]]; then
# Get timestamp from file
FILE_TS=$(grep -E '^timestamp:' "$STATE_FILE" 2>/dev/null | cut -d' ' -f2)
NOW=$(date '+%s')
if [[ -n "$FILE_TS" ]]; then
AGE=$((NOW - FILE_TS))
if [[ $AGE -lt $THREE_HOURS ]]; then
# Recent - read it back
echo "[SessionStart] Found recent scratchpad ($(($AGE / 60)) min ago)" >&2
echo "[SessionStart] Reading previous state..." >&2
echo "" >&2
cat "$STATE_FILE" >&2
echo "" >&2
else
# Stale - delete and start fresh
rm -f "$STATE_FILE"
echo "[SessionStart] Previous session >3h old - starting fresh" >&2
fi
else
# No timestamp, delete it
rm -f "$STATE_FILE"
fi
fi
exit 0

View file

@ -0,0 +1,28 @@
#!/bin/bash
# Suggest /compact at logical intervals to manage context window
# Tracks tool calls per session, suggests compaction every 50 calls
SESSION_ID="${CLAUDE_SESSION_ID:-$$}"
COUNTER_FILE="/tmp/claude-tool-count-${SESSION_ID}"
THRESHOLD="${COMPACT_THRESHOLD:-50}"
# Read or initialize counter
if [[ -f "$COUNTER_FILE" ]]; then
COUNT=$(($(cat "$COUNTER_FILE") + 1))
else
COUNT=1
fi
echo "$COUNT" > "$COUNTER_FILE"
# Suggest compact at threshold
if [[ $COUNT -eq $THRESHOLD ]]; then
echo "[Compact] ${THRESHOLD} tool calls - consider /compact if transitioning phases" >&2
fi
# Suggest at intervals after threshold
if [[ $COUNT -gt $THRESHOLD ]] && [[ $((COUNT % 25)) -eq 0 ]]; then
echo "[Compact] ${COUNT} tool calls - good checkpoint for /compact" >&2
fi
exit 0