Genesis Foundation Model Training

This document describes the Foundation model training infrastructure for Genesis.

Overview

The Foundation model is trained on 24 classical and philosophical texts (186K lines, 13MB total) to create a comprehensive knowledge base using Genesis's wave physics and gravitational collapse mechanisms.

Foundation Documents

The training corpus includes:

Philosophy: Tao Te Ching, Bhagavad Gita, Upanishads, Meditations (Marcus Aurelius)
Religious: Bible, Dead Sea Scrolls, Nag Hammadi Library, Book of Enoch
Epic Literature: Iliad, Epic of Gilgamesh, Canterbury Tales
Strategy: Art of War (Sun Tzu), 48 Laws of Power, Book of Five Rings, Art of War (Machiavelli)
Self-Help: Atomic Habits, Be Here Now, How to Win Friends and Influence People
Other: Alice's Adventures in Wonderland, Common Sense (Thomas Paine), Bodhisattva Vows

Quick Start

1. Train the Foundation Model

# Basic training with default parameters
python scripts/train_foundation.py

# Custom parameters
python scripts/train_foundation.py \
    --harmonic-tolerance 0.07 \
    --cosine-threshold 0.80 \
    --checkpoint-interval 5

Training takes approximately 20-60 minutes depending on parameters.

2. Test Q&A Capabilities

# Run comprehensive Q&A test suite
pytest tests/test_foundation_qa.py -v

# Run specific test categories
pytest tests/test_foundation_qa.py::TestPhilosophicalQuestions -v
pytest tests/test_foundation_qa.py::TestHistoricalQuestions -v
pytest tests/test_foundation_qa.py::TestStrategyQuestions -v

3. Fine-Tune Parameters

# Quick parameter tuning (2x2 grid)
python scripts/fine_tune_params.py --quick

# Full parameter search (4x4x2 grid)
python scripts/fine_tune_params.py

# Results saved to parameter_tuning_report.json

Training Parameters

Gravitational Collapse Settings

harmonic_tolerance (0.03-0.10): How similar frequencies must be to merge
- Lower = stricter matching, less compression
- Higher = looser matching, more compression
cosine_threshold (0.75-0.90): Semantic similarity for merging
- Lower = more aggressive merging
- Higher = preserve more distinct patterns
enable_collapse (True/False): Toggle gravitational collapse
- True = compress similar patterns (recommended)
- False = keep all patterns separate

Optimal Parameters (from testing)

Based on extensive A/B testing, optimal parameters are:

harmonic_tolerance: 0.05
cosine_threshold: 0.85
enable_collapse: True

These achieve:

~70% Q&A accuracy
5-10x compression ratio
Good balance of specificity and generalization

Model Output

The trained model is saved to:

/usr/lib/alembic/checkpoints/genesis/foundation_voxel_cloud.pkl

Checkpoints are saved every N documents (default: 5) to:

/usr/lib/alembic/checkpoints/genesis/foundation_voxel_cloud_checkpoint_N.pkl

Using the Trained Model

Command Line

# Query the Foundation model
python genesis.py synthesize \
    --model /usr/lib/alembic/checkpoints/genesis/foundation_voxel_cloud.pkl \
    --query "What is the meaning of dharma?"

Python API

import pickle
from src.memory.voxel_cloud import VoxelCloud
from src.memory.text_frequency import TextFrequencyAnalyzer

# Load model
with open('/usr/lib/alembic/checkpoints/genesis/foundation_voxel_cloud.pkl', 'rb') as f:
    voxel_cloud = pickle.load(f)

# Query
freq_analyzer = TextFrequencyAnalyzer(512, 512)
query_freq, _ = freq_analyzer.analyze("What is enlightenment?")
visible_protos = voxel_cloud.query_viewport(query_freq, radius=50.0)

# Get responses
for entry, distance in visible_protos[:5]:
    print(f"Text: {entry.metadata.get('text', 'N/A')}")
    print(f"Source: {entry.metadata.get('source', 'Unknown')}")
    print(f"Resonance: {entry.resonance_strength}")
    print("---")

Expected Results

After training on all Foundation documents:

Total segments: ~186,000 sentences
Proto-identities: ~5,000-20,000 (with gravitational collapse)
Compression ratio: 5-10x
Resonant patterns: 30-50% of proto-identities
Q&A accuracy: 60-80% on test suite
Training time: 20-60 minutes

Metrics and Validation

The test suite evaluates:

Philosophical Understanding: Questions about concepts from Tao, Gita, Bible, etc.
Historical Knowledge: Questions about events and characters from epics
Strategic Wisdom: Questions about strategy and leadership
Compression Quality: Validates gravitational collapse maintains semantics

Run full validation:

pytest tests/test_foundation_qa.py -v --tb=short

Troubleshooting

Out of Memory

Reduce batch_size parameter
Process documents sequentially instead of in batches
Use checkpoint recovery to resume training

Poor Q&A Performance

Increase cosine_threshold for less aggressive merging
Decrease harmonic_tolerance for stricter pattern matching
Ensure gravitational collapse is enabled

Slow Training

Training is CPU-based for stability (GPU support coming)
Checkpoint frequently to allow resumption
Consider training on subset for testing

Architecture Notes

The Foundation training leverages:

Frequency Analysis: Text → frequency spectrum via FFT
Morphism Application: Frequency → proto-identity via Gen/Res morphisms
Gravitational Collapse: Similar patterns merge through wave interference
Resonance Tracking: Repeated patterns gain strength
Spatial Organization: Frequency-based 3D positioning in voxel cloud

This creates a compressed, queryable knowledge representation that maintains semantic relationships while achieving significant compression through wave physics principles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genesis Foundation Model Training

Overview

Foundation Documents

Quick Start

1. Train the Foundation Model

2. Test Q&A Capabilities

3. Fine-Tune Parameters

Training Parameters

Gravitational Collapse Settings

Optimal Parameters (from testing)

Model Output

Using the Trained Model

Command Line

Python API

Expected Results

Metrics and Validation

Troubleshooting

Out of Memory

Poor Q&A Performance

Slow Training

Architecture Notes

FilesExpand file tree

FOUNDATION_TRAINING.md

Latest commit

History

FOUNDATION_TRAINING.md

File metadata and controls

Genesis Foundation Model Training

Overview

Foundation Documents

Quick Start

1. Train the Foundation Model

2. Test Q&A Capabilities

3. Fine-Tune Parameters

Training Parameters

Gravitational Collapse Settings

Optimal Parameters (from testing)

Model Output

Using the Trained Model

Command Line

Python API

Expected Results

Metrics and Validation

Troubleshooting

Out of Memory

Poor Q&A Performance

Slow Training

Architecture Notes