.. _rl-utils: ====================== RL Utilities (utils) ====================== .. admonition:: At a Glance :class: tip :Purpose: Infrastructure utilities for RL training, callbacks, and hyperparameter management :Location: ``fusion/modules/rl/utils/`` :Key Classes: ``EpisodicRewardCallback``, ``LearnRateEntCallback``, ``HyperparamConfig`` :Integration: Used by all RL algorithms, environments, and training workflows Overview ======== The utils module provides essential infrastructure supporting the entire RL framework. It handles everything from custom training callbacks to hyperparameter optimization, model initialization, and simulation data management. **Key Capabilities:** - **Custom SB3 Callbacks**: Powerful callbacks for reward tracking and dynamic hyperparameter adjustment - **Hyperparameter Management**: Flexible configuration with multiple decay strategies - **Optuna Integration**: Automated hyperparameter search space generation - **Model Setup**: Factory functions for all supported SB3 algorithms - **GNN Caching**: Efficient caching of GNN embeddings for faster training Custom Callbacks ================ FUSION provides custom Stable-Baselines3 callbacks that extend standard training capabilities. These callbacks integrate seamlessly with any SB3 algorithm and provide features specifically designed for optical network optimization. EpisodicRewardCallback ---------------------- Tracks episode rewards across training and saves them periodically for analysis. **Features:** - Tracks cumulative rewards per episode - Saves reward matrices at configurable intervals - Supports multi-trial reward aggregation - Compatible with all SB3 algorithms .. code-block:: python from stable_baselines3 import PPO from fusion.modules.rl.utils import EpisodicRewardCallback # Create callback reward_callback = EpisodicRewardCallback(verbose=1) reward_callback.sim_dict = sim_dict # Attach simulation config reward_callback.max_iters = 1000 # Train with callback model = PPO("MultiInputPolicy", env, verbose=1) model.learn(total_timesteps=100_000, callback=reward_callback) # Access rewards after training episode_rewards = reward_callback.episode_rewards print(f"Average reward: {episode_rewards.mean():.2f}") **Saved Output Format:** .. code-block:: text logs////