Skip to content

A sophisticated transformer-based language model with integrated Retrieval-Augmented Generation (RAG) capabilities for intelligent question answering and conversation.

License

Notifications You must be signed in to change notification settings

Anipaleja/iLLuMinator-4.9B

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iLLuMinator 4.9B - Professional Language Model Training Framework

A comprehensive, production-ready implementation for training and deploying a 4.9 billion parameter transformer language model. This framework incorporates state-of-the-art architectures, optimization techniques, and training datasets used by leading AI research organizations.

Table of Contents

Overview

iLLuMinator 4.9B represents a cutting-edge approach to large language model training, incorporating the latest advances in transformer architectures, optimization algorithms, and training methodologies. The framework is designed for both research and production environments, with comprehensive support for various hardware configurations and deployment scenarios.

Key Features

  • Advanced Architecture: Enhanced transformer with Grouped Query Attention (GQA), Rotary Position Embedding (RoPE), SwiGLU activation, and RMSNorm
  • Production-Ready: Comprehensive error handling, logging, monitoring, and checkpointing
  • Memory Efficient: Advanced memory optimization techniques including gradient checkpointing and mixed precision training
  • Scalable Training: Support for distributed training across multiple GPUs and nodes
  • Professional Datasets: Integration with high-quality datasets used to train leading AI models
  • Flexible Deployment: Multiple deployment options including API servers, interactive clients, and batch processing

Architecture

The iLLuMinator 4.9B model implements a state-of-the-art transformer architecture with several key innovations:

Core Architecture Components

Component Configuration Description
Parameters 4.9 billion Optimized for performance/efficiency balance
Layers 32 transformer blocks Deep architecture for complex reasoning
Attention Heads 32 (8 KV heads) Grouped Query Attention for efficiency
Model Dimension 4096 Hidden state dimension
Feed-Forward 14336 Enhanced feed-forward network
Context Length 4096 tokens Extended context for long-form content
Vocabulary 65536 tokens Enhanced tokenization coverage

Advanced Features

  1. Grouped Query Attention (GQA)

    • Reduces memory usage during inference
    • Maintains model quality while improving efficiency
    • Based on research from Google and Meta
  2. Rotary Position Embedding (RoPE)

    • Superior handling of positional information
    • Better extrapolation to longer sequences
    • Eliminates the need for absolute position embeddings
  3. SwiGLU Activation Function

    • Improved performance over traditional GELU
    • Based on "GLU Variants Improve Transformer" research
    • Used in PaLM, LLaMA, and other leading models
  4. RMSNorm Normalization

    • More stable training than LayerNorm
    • Faster computation and better numerical stability
    • Standard in modern large language models
  5. Memory Optimizations

    • Gradient checkpointing for reduced memory usage
    • Mixed precision training (FP16/BF16)
    • Efficient attention implementations

Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA 11.8+ (for GPU training)
  • 16GB+ RAM (32GB+ recommended)
  • 100GB+ free disk space

Environment Setup

# Clone the repository
git clone https://github.com/Anipaleja/iLLuMinator-4.9B.git
cd iLLuMinator-4.9B

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Optional: Install additional optimizations
pip install flash-attn  # For flash attention (requires compatible GPU)
pip install deepspeed   # For distributed training

Docker Installation

# Build Docker container
docker build -t illuminator-4.9b .

# Run container with GPU support
docker run --gpus all -it illuminator-4.9b

Quick Start

Basic Training

# 1. Install dependencies
pip install -r requirements.txt

# 2. Start training with default configuration
python train_comprehensive.py

# 3. Monitor training progress
tail -f training_logs/training.log

Advanced Training Options

# Training with custom configuration
python train_comprehensive.py \
    --model-size 4.9B \
    --batch-size 2 \
    --learning-rate 3e-4 \
    --epochs 10 \
    --save-every 1000

# Apple Silicon optimized training
python train_4.9b_apple_silicon.py

# Training with external datasets
python practical_model/train_enhanced.py --use-external-data

Training Data

The iLLuMinator 4.9B model is trained on a carefully curated dataset that mirrors the high-quality data used by leading AI research organizations. Our training corpus includes:

Core Datasets

  1. OpenOrca (4.5M examples)

    • High-quality instruction-following data
    • Filtered and deduplicated from FLAN collection
    • Emphasizes reasoning and complex instruction following
  2. UltraChat (1.57M examples)

    • Multi-turn conversational data
    • Covers diverse topics and conversation styles
    • Includes context-aware dialogue training
  3. WizardLM (196K examples)

    • Evolved instruction-following examples
    • Complex reasoning tasks and multi-step problems
    • Generated through Evol-Instruct methodology
  4. Stanford Alpaca (52K examples)

    • High-quality instruction-following dataset
    • Human-curated and verified examples
    • Covers broad range of tasks and domains

Specialized Training Corpora

  1. Code Datasets

    • GitHub repositories with permissive licenses
    • Stack Overflow Q&A pairs
    • Programming tutorials and documentation
    • Multiple programming languages
  2. Scientific Literature

    • ArXiv papers and abstracts
    • Wikipedia scientific articles
    • Educational content and textbooks
    • Technical documentation
  3. Web Corpus (Filtered)

    • CommonCrawl data (extensively filtered)
    • Reddit discussions (high-quality subreddits)
    • News articles and journalism
    • Reference materials and documentation

Data Processing Pipeline

# Example of our data processing pipeline
def process_training_data():
    # 1. Load raw datasets
    datasets = load_multiple_sources()
    
    # 2. Quality filtering
    filtered_data = apply_quality_filters(datasets)
    
    # 3. Deduplication
    deduplicated_data = remove_duplicates(filtered_data)
    
    # 4. Format standardization
    formatted_data = standardize_format(deduplicated_data)
    
    # 5. Tokenization and chunking
    tokenized_data = tokenize_and_chunk(formatted_data)
    
    return tokenized_data

Data Quality Assurance

  • Toxicity Filtering: Advanced toxicity detection and removal
  • PII Removal: Comprehensive personally identifiable information filtering
  • Language Detection: Multi-language support with quality thresholds
  • Duplicate Detection: Advanced near-duplicate detection algorithms
  • Content Verification: Automated fact-checking and consistency validation

Model Configurations

The framework supports multiple model configurations for different use cases:

Available Configurations

Configuration Parameters Use Case Hardware Requirement
iLLuMinator-4.9B 4.9B Full model 24GB+ VRAM
iLLuMinator-2.8B 2.8B Medium scale 16GB+ VRAM
iLLuMinator-1.3B 1.3B Efficient 8GB+ VRAM
iLLuMinator-Practical 120M Development/Testing 4GB+ VRAM

Configuration Files

# Example: 4.9B configuration
config_4_9b = {
    "vocab_size": 65536,
    "d_model": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "n_kv_heads": 8,
    "d_ff": 14336,
    "max_seq_length": 4096,
    "dropout": 0.0,
    "tie_embeddings": True,
    "use_rope": True,
    "use_swiglu": True,
    "use_rmsnorm": True
}

Training Pipeline

Training Stages

  1. Data Preparation

    • Dataset loading and preprocessing
    • Tokenization and formatting
    • Data validation and quality checks
  2. Model Initialization

    • Architecture setup
    • Weight initialization
    • Optimizer configuration
  3. Training Loop

    • Forward pass computation
    • Loss calculation
    • Backward pass and optimization
    • Checkpointing and logging
  4. Evaluation and Validation

    • Perplexity calculation
    • Benchmark evaluations
    • Model quality assessment

Training Scripts

# Main training scripts
train_comprehensive.py      # Full training pipeline
train_4.9b_apple_silicon.py # Apple Silicon optimized
train_enhanced.py           # Enhanced with validation
train_mfu.py               # MFU optimized training

Hyperparameter Configuration

# Training hyperparameters
training_config = {
    "learning_rate": 3e-4,
    "weight_decay": 0.1,
    "beta1": 0.9,
    "beta2": 0.95,
    "eps": 1e-8,
    "warmup_steps": 2000,
    "max_steps": 100000,
    "gradient_clip": 1.0,
    "batch_size": 2,
    "gradient_accumulation_steps": 32,
    "save_every": 1000,
    "eval_every": 500
}

Optimization Techniques

Memory Optimization

  1. Gradient Checkpointing

    • Trades computation for memory
    • Enables training larger models on limited hardware
    • Configurable checkpointing strategies
  2. Mixed Precision Training

    • Automatic Mixed Precision (AMP)
    • FP16 and BF16 support
    • Significant memory and speed improvements
  3. Optimizer State Sharding

    • Distributed optimizer states
    • Reduced memory footprint per GPU
    • Enabled through DeepSpeed integration

Computational Optimization

  1. Flash Attention

    • Memory-efficient attention computation
    • Significant speedup for long sequences
    • Automatic fallback for unsupported hardware
  2. Kernel Fusion

    • Fused operations where possible
    • Reduced memory bandwidth requirements
    • Custom CUDA kernels for critical operations
  3. Dynamic Loss Scaling

    • Prevents gradient underflow in FP16
    • Automatic scaling adjustment
    • Maintains training stability

Distributed Training

# Multi-GPU training setup
torchrun --nproc_per_node=4 train_comprehensive.py \
    --distributed \
    --backend=nccl \
    --gradient-accumulation-steps=8

Usage

After training, the model can be loaded and used for:

  • Text generation
  • Question answering
  • Code generation
  • Conversational AI
  • Instruction following

Project Structure

iLLuMinator-4.9B/
 train_4_9B_professional.py    # Main training script
 inference.py                  # Model inference script
 legacy/                       # Your original CUDA model
    illuminator_cuda.py       # 4.9B CUDA model
    illuminator_ai.py         # AI implementation
    cuda_api_server.py        # API server
 simple_illuminator.py         # Simple AI implementation
 client.py                     # Client interface
 chatbot_client.py             # Chatbot client
 requirements.txt              # Dependencies
 README.md                     # This file

Troubleshooting

Common Issues

  1. Out of Memory Errors

    # Reduce batch size or sequence length
    python train_comprehensive.py --batch-size 1 --max-seq-length 2048
    
    # Enable gradient checkpointing
    python train_comprehensive.py --gradient-checkpointing
  2. CUDA Out of Memory

    # Use mixed precision training
    python train_comprehensive.py --mixed-precision
    
    # Enable CPU offloading
    python train_comprehensive.py --cpu-offload
  3. Slow Training Speed

    # Enable flash attention
    pip install flash-attn
    python train_comprehensive.py --flash-attention
    
    # Use multiple GPUs
    torchrun --nproc_per_node=2 train_comprehensive.py

Performance Optimization

  1. Memory Usage Monitoring

    # Monitor GPU memory
    python mps-monitor.py --watch-memory
    
    # Memory profiling
    python train_comprehensive.py --profile-memory
  2. Training Speed Analysis

    # Analyze Model FLOPs Utilization
    python mfu_analyzer.py --model-path checkpoints/model.pt

Debug Mode

# Enable debug logging
python train_comprehensive.py --debug --log-level DEBUG

Project Structure

iLLuMinator-4.9B/
├── README.md                     # This comprehensive documentation
├── requirements.txt              # Python dependencies
├── LICENSE                       # MIT license
│
├── models/                       # Model architecture definitions
│   ├── illuminator_model.py      # Main 4.9B model implementation
│   ├── illuminator_practical.py  # Practical 120M model
│   └── config/                   # Model configuration files
│
├── training/                     # Training scripts and utilities
│   ├── train_comprehensive.py    # Main training script
│   ├── train_4.9b_apple_silicon.py  # Apple Silicon optimized
│   ├── train_enhanced.py         # Enhanced training with validation
│   ├── train_mfu.py              # MFU optimized training
│   └── optimizers/               # Custom optimization modules
│
├── data/                         # Data processing and loading
│   ├── dataset_integration.py    # Dataset loading and processing
│   ├── tokenizer/                # Custom tokenizer implementation
│   └── datasets/                 # Training datasets (created during setup)
│
├── evaluation/                   # Model evaluation and benchmarking
│   ├── evaluate.py               # Comprehensive evaluation script
│   ├── benchmarks/               # Standard benchmark implementations
│   └── metrics/                  # Custom evaluation metrics
│
├── deployment/                   # Deployment and serving
│   ├── api_server.py             # REST API server
│   ├── interactive_client.py     # Interactive chat client
│   ├── practical_api_server.py   # Lightweight API for practical model
│   └── docker/                   # Docker configurations
│
├── utils/                        # Utility scripts and tools
│   ├── mps-monitor.py            # Memory and performance monitoring
│   ├── mfu_analyzer.py           # Model FLOPs utilization analysis
│   ├── model_comparison.py       # Model architecture comparison
│   └── optimization/             # Performance optimization tools
│
├── practical_model/              # Lightweight model for development
│   ├── illuminator_practical.py  # 120M parameter model
│   ├── train_practical.py        # Training script for practical model
│   ├── simple_test.py            # Quick testing utilities
│   └── README.md                 # Practical model documentation
│
├── legacy/                       # Legacy implementations (deprecated)
│   ├── illuminator_cuda.py       # Original CUDA implementation
│   └── cuda_api_server.py        # Legacy API server
│
├── checkpoints/                  # Model checkpoints (created during training)
├── logs/                         # Training and evaluation logs
└── docs/                         # Additional documentation
    ├── TRAINING_GUIDE.md         # Detailed training guide
    ├── DEPLOYMENT_GUIDE.md       # Production deployment guide
    └── API_REFERENCE.md          # Complete API documentation

Contributing

We welcome contributions to the iLLuMinator project! Please see our contribution guidelines:

Development Setup

# Fork and clone the repository
git clone https://github.com/yourusername/iLLuMinator-4.9B.git
cd iLLuMinator-4.9B

# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

Code Standards

  • Follow PEP 8 style guidelines
  • Include comprehensive docstrings
  • Add unit tests for new features
  • Ensure backwards compatibility
  • Update documentation as needed

Submitting Changes

  1. Create a feature branch: git checkout -b feature-name
  2. Make your changes and add tests
  3. Run the test suite: python -m pytest tests/
  4. Submit a pull request with a clear description

Research Citations

If you use iLLuMinator in your research, please cite:

@software{illuminator_4_9b,
  title={iLLuMinator 4.9B: Professional Language Model Training Framework},
  author={Your Name},
  year={2024},
  url={https://github.com/Anipaleja/iLLuMinator-4.9B}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Transformer architecture based on "Attention Is All You Need" (Vaswani et al.)
  • Grouped Query Attention from "GQA: Training Generalized Multi-Query Transformer" (Ainslie et al.)
  • RoPE implementation based on "RoFormer: Enhanced Transformer with Rotary Position Embedding" (Su et al.)
  • SwiGLU activation from "GLU Variants Improve Transformer" (Shazeer)
  • Training techniques inspired by PaLM, LLaMA, and GPT-4 methodologies
  • Dataset processing based on LLMDataHub recommendations

Support

For questions, issues, or feature requests:


Status: Production Ready | Latest Version: 1.0.0 | Last Updated: January 2024

About

A sophisticated transformer-based language model with integrated Retrieval-Augmented Generation (RAG) capabilities for intelligent question answering and conversation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •