Getting Started with Dream Trainer¶

This guide will help you get up and running with Dream Trainer quickly. We'll cover installation, basic usage, and walk through a complete example.

Table of Contents¶

Installation
Basic Usage
Your First Training Run
Multi-GPU Training
Logging and Monitoring
Next Steps

Installation¶

Basic Installation¶

Install Dream Trainer using pip:

pip install dream-trainer

Optional Dependencies¶

Dream Trainer has several optional dependencies for additional features:

# For WandB logging
pip install dream-trainer[wandb]

# For rich progress bars
pip install dream-trainer[rich]

# For metric tracking
pip install dream-trainer[metrics]

# For FP8 quantization support
pip install dream-trainer[torchao]

# For fault tolerance
pip install dream-trainer[torchft]

# Install all optional dependencies
pip install dream-trainer[all]

Development Installation¶

For development or to run examples:

git clone https://github.com/dream3d/dream-trainer.git
cd dream-trainer
pip install -e ".[dev]"

Basic Usage¶

1. Create Your Trainer¶

First, create a custom trainer by extending DreamTrainer:

from dream_trainer import DreamTrainer, DreamTrainerConfig
from dream_trainer.configs import TrainingParameters, DeviceParameters
import torch
import torch.nn as nn
from torch.optim import AdamW
from torch.utils.data import DataLoader, TensorDataset

class MyTrainer(DreamTrainer):
    def __init__(self, config: DreamTrainerConfig, model: nn.Module):
        super().__init__(config)
        self.model = model

    def configure_models(self):
        """Configure your model(s) here"""
        # Models are automatically moved to the correct device
        # and wrapped with distributed training wrappers
        pass

    def configure_optimizers(self):
        """Configure optimizer(s)"""
        self.optimizer = AdamW(
            self.model.parameters(),
            lr=1e-4,
            weight_decay=0.01
        )

    def configure_dataloaders(self):
        """Configure train and validation dataloaders"""
        # Example dummy data
        train_data = TensorDataset(
            torch.randn(1000, 10),
            torch.randint(0, 2, (1000,))
        )
        val_data = TensorDataset(
            torch.randn(100, 10),
            torch.randint(0, 2, (100,))
        )

        train_loader = DataLoader(
            train_data,
            batch_size=32,
            shuffle=True
        )
        val_loader = DataLoader(
            val_data,
            batch_size=32,
            shuffle=False
        )

        return train_loader, val_loader

    def training_step(self, batch, batch_idx):
        """Define a single training step"""
        inputs, targets = batch

        # Forward pass
        outputs = self.model(inputs)
        loss = nn.functional.cross_entropy(outputs, targets)

        # Backward pass (handled automatically)
        self.backward(loss)

        # Return metrics to log
        return {
            "loss": loss,
            "lr": self.optimizer.param_groups[0]["lr"]
        }

    def validation_step(self, batch, batch_idx):
        """Define a single validation step"""
        inputs, targets = batch

        # Forward pass
        outputs = self.model(inputs)
        loss = nn.functional.cross_entropy(outputs, targets)

        # Calculate accuracy
        preds = outputs.argmax(dim=1)
        accuracy = (preds == targets).float().mean()

        return {
            "val_loss": loss,
            "val_accuracy": accuracy
        }

2. Configure Your Training¶

Create a configuration for your training run:

from dream_trainer.callbacks import (
    LoggerCallback,
    ProgressBar,
    CallbackCollection
)

# Create configuration
config = DreamTrainerConfig(
    # Project settings
    project="my-ml-project",
    group="classification",
    experiment="baseline-v1",

    # Device settings
    device_parameters=DeviceParameters(
        # Distributed training settings
        data_parallel_size=1,  # Number of GPUs for data parallelism
        tensor_parallel_size=1,  # Tensor parallelism degree
        pipeline_parallel_size=1,  # Pipeline parallelism degree

        # Performance settings
        compile_model=True,  # Use torch.compile
        param_dtype=torch.bfloat16,  # Mixed precision
    ),

    # Training settings
    training_parameters=TrainingParameters(
        n_epochs=10,
        train_batch_size=32,
        gradient_clip_val=1.0,
        checkpoint_activations=False,

        # Validation settings
        val_frequency=0.5,  # Validate every half epoch
        num_sanity_val_steps=2,  # Sanity check before training
    ),

    # Callbacks
    callbacks=CallbackCollection([
        LoggerCallback(),  # Logs metrics to console/WandB
        ProgressBar(),  # Shows training progress
    ])
)

3. Train Your Model¶

# Create model
model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Linear(64, 32),
    nn.ReLU(),
    nn.Linear(32, 2)
)

# Create trainer
trainer = MyTrainer(config, model)

if __name__ == "__main__":
    from dream_trainer.utils.entrypoint import entrypoint

    entrypoint(trainer.fit)

Multi-GPU Training¶

Dream Trainer makes distributed training simple. To use multiple GPUs:

Single Node, Multiple GPUs¶

# Using torchrun (recommended)
torchrun --nproc_per_node=4 train.py

# Or using the trainer directly with updated config
config = DreamTrainerConfig(
    # ... other settings ...
    device_parameters=DeviceParameters(
        data_parallel_size=4,  # Use 4 GPUs
    )
)

Multiple Nodes¶

# On node 0
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=0 \
         --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT train.py

# On node 1
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=1 \
         --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT train.py

Logging and Monitoring¶

WandB Integration¶

To use Weights & Biases for experiment tracking:

from dream_trainer.configs import WandBParameters

config = DreamTrainerConfig(
    # ... other settings ...
    wandb_parameters=WandBParameters(
        project="my-project",
        entity="my-team",
        tags=["experiment", "classification"],
        notes="Initial baseline run"
    )
)

Custom Logging¶

You can create custom logging callbacks:

from dream_trainer.callbacks import Callback

class CustomLogger(Callback):
    def on_train_batch_end(self, trainer, outputs, batch, batch_idx):
        # Log custom metrics
        trainer.log("custom_metric", outputs["custom_metric"])

Next Steps¶

Now that you have the basics, here are some recommended next steps:

Read the Configuration Guide to learn about all available options
Check out the Trainer Guide for advanced trainer customization
Explore Callbacks to extend functionality
Try Distributed Training for multi-GPU setups
Look at Examples for complete working code

Common Issues¶

Installation Problems¶

If you encounter installation issues:

Make sure you have Python 3.10+ installed
Ensure PyTorch is installed correctly for your CUDA version
Try installing in a fresh virtual environment

Training Issues¶

Common training problems and solutions:

Out of Memory: Reduce batch size or enable gradient accumulation
Slow Training: Enable mixed precision or model compilation
Poor Performance: Check learning rate and optimizer settings

Getting Help¶

If you need help:

Check the documentation
Look at examples
Open an issue on GitHub
Join our community chat