.. _rl-sb3: ======================================= Stable-Baselines3 Integration (sb3) ======================================= .. admonition:: At a Glance :class: tip :Purpose: Environment registration and RLZoo3 integration for SB3 training :Location: ``fusion/modules/rl/sb3/`` :Key Functions: ``copy_yml_file()``, ``main()`` :External Docs: `RLZoo3 Documentation `_ Overview ======== The sb3 module provides utilities for integrating FUSION environments with the Stable-Baselines3 (SB3) reinforcement learning framework. It handles: 1. **Environment Registration**: Register custom environments with Gymnasium 2. **RLZoo3 Integration**: Deploy hyperparameter configurations for automated training **Why This Module?** Stable-Baselines3 and RLZoo3 provide a powerful ecosystem for: - Standardized RL algorithm implementations (PPO, DQN, A2C, etc.) - Hyperparameter optimization with Optuna integration - Experiment tracking and reproducibility - Pre-tuned configurations for common environments This module bridges FUSION's custom environments with this ecosystem. RLZoo3 Integration ================== FUSION supports automatic integration with `RLZoo3 `_, a training framework built on Stable-Baselines3 that provides: - **Hyperparameter Optimization**: Automated tuning with Optuna - **Experiment Management**: Organized logging and model saving - **Benchmarking**: Standardized evaluation protocols - **Pre-tuned Configs**: Curated hyperparameters for various environments Training Workflow ----------------- .. code-block:: text +------------------+ +------------------+ +------------------+ | 1. Register Env |---->| 2. Deploy Config |---->| 3. Train with | | with Gymnasium| | to RLZoo3 | | RLZoo3 | +------------------+ +------------------+ +------------------+ **Step 1: Register Environment** .. code-block:: bash python -m fusion.modules.rl.sb3.register_env --algo PPO --env-name SimEnv **Step 2: Train with RLZoo3** .. code-block:: bash # Basic training python -m rl_zoo3.train --algo ppo --env SimEnv # With hyperparameter optimization python -m rl_zoo3.train --algo ppo --env SimEnv -optimize --n-trials 100 # With custom config python -m rl_zoo3.train --algo ppo --env SimEnv --conf-file custom_ppo.yml **Step 3: Evaluate** .. code-block:: bash python -m rl_zoo3.enjoy --algo ppo --env SimEnv --folder logs/ Environment Registration ======================== The ``main()`` function registers FUSION environments with Gymnasium's registry: .. code-block:: python from gymnasium.envs.registration import register # Register custom environment register( id="SimEnv", entry_point="reinforcement_learning.gymnasium_envs.general_sim_env:SimEnv" ) Command-Line Interface ---------------------- .. code-block:: bash python -m fusion.modules.rl.sb3.register_env --algo ALGO --env-name ENV **Arguments:** .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Argument - Required - Description * - ``--algo`` - Yes - Algorithm name (PPO, DQN, A2C, etc.) * - ``--env-name`` - Yes - Environment class name to register **Example:** .. code-block:: bash # Register SimEnv with PPO configuration python -m fusion.modules.rl.sb3.register_env --algo PPO --env-name SimEnv # Register with DQN python -m fusion.modules.rl.sb3.register_env --algo DQN --env-name SimEnv Configuration Management ======================== copy_yml_file ------------- Deploys algorithm configuration files to RLZoo3's hyperparameters directory: .. code-block:: python from fusion.modules.rl.sb3 import copy_yml_file # Copy PPO configuration to RLZoo3 copy_yml_file("PPO") **File Paths:** .. code-block:: text Source: sb3_scripts/yml/{algorithm}.yml Destination: venvs/.../site-packages/rl_zoo3/hyperparams/{algorithm}.yml Configuration File Format ------------------------- Algorithm configurations use YAML format compatible with RLZoo3: **PPO Example:** .. code-block:: yaml SimEnv: policy: 'MlpPolicy' n_timesteps: !!float 2e6 learning_rate: lin_3e-4 n_steps: 2048 batch_size: 64 n_epochs: 10 gamma: 0.99 gae_lambda: 0.95 clip_range: 0.2 ent_coef: 0.0 vf_coef: 0.5 max_grad_norm: 0.5 **DQN Example:** .. code-block:: yaml SimEnv: policy: 'MlpPolicy' n_timesteps: !!float 1e6 buffer_size: 1000000 learning_rate: !!float 1e-4 learning_starts: 50000 batch_size: 32 tau: 1.0 gamma: 0.99 train_freq: 4 gradient_steps: 1 target_update_interval: 10000 **Key Parameters:** .. list-table:: :header-rows: 1 :widths: 25 75 * - Parameter - Description * - ``policy`` - Policy architecture (MlpPolicy, MultiInputPolicy, CnnPolicy) * - ``n_timesteps`` - Total training timesteps * - ``learning_rate`` - Learning rate (can use schedules like ``lin_3e-4``) * - ``batch_size`` - Minibatch size for updates * - ``gamma`` - Discount factor Directory Structure =================== .. code-block:: text project/ |-- sb3_scripts/yml/ # Source configuration files | |-- PPO.yml | |-- DQN.yml | |-- A2C.yml | `-- ... | |-- fusion/modules/rl/sb3/ # This module | |-- __init__.py | |-- register_env.py | `-- README.md | `-- logs/ # RLZoo3 training outputs `-- ppo/ `-- SimEnv_1/ |-- model.zip # Trained model |-- config.yml # Training config `-- evaluations/ # Evaluation results Using with FUSION Environments ============================== Standard Training Pipeline -------------------------- .. code-block:: python from stable_baselines3 import PPO from fusion.modules.rl.gymnasium_envs import create_sim_env # Create FUSION environment config = {"k_paths": 3, "spectral_slots": 320} env = create_sim_env(config, env_type="unified") # Train with SB3 directly model = PPO("MultiInputPolicy", env, verbose=1) model.learn(total_timesteps=100_000) # Save model model.save("ppo_fusion") With RLZoo3 Hyperparameter Optimization --------------------------------------- .. code-block:: bash # 1. Register environment python -m fusion.modules.rl.sb3.register_env --algo PPO --env-name SimEnv # 2. Run hyperparameter optimization python -m rl_zoo3.train \ --algo ppo \ --env SimEnv \ -optimize \ --n-trials 100 \ --sampler tpe \ --pruner median # 3. Train with best hyperparameters python -m rl_zoo3.train \ --algo ppo \ --env SimEnv \ --conf-file logs/ppo/SimEnv_1/best_model/config.yml With Custom Feature Extractors ------------------------------ .. code-block:: python from stable_baselines3 import PPO from fusion.modules.rl.feat_extrs import PathGNN from fusion.modules.rl.gymnasium_envs import create_sim_env env = create_sim_env(config, env_type="unified") model = PPO( "MultiInputPolicy", env, policy_kwargs={ "features_extractor_class": PathGNN, "features_extractor_kwargs": { "emb_dim": 64, "gnn_type": "gat", "layers": 2, } }, verbose=1, ) Error Handling ============== The module provides clear error messages for common issues: **FileNotFoundError:** .. code-block:: text Configuration file not found: sb3_scripts/yml/PPO.yml. Ensure the algorithm configuration exists in sb3_scripts/yml/ **PermissionError:** .. code-block:: text Cannot write to RLZoo3 directory: .../rl_zoo3/hyperparams/PPO.yml. Check file permissions and virtual environment access. File Reference ============== .. code-block:: text fusion/modules/rl/sb3/ |-- __init__.py # Public exports |-- register_env.py # Registration utilities `-- README.md # Module documentation **Public API:** .. code-block:: python from fusion.modules.rl.sb3 import ( copy_yml_file, # Deploy config to RLZoo3 main, # CLI entry point ) Related Documentation ===================== - :ref:`rl-algorithms` - RL algorithm wrappers (PPO, DQN, A2C) - :ref:`rl-feat-extrs` - GNN feature extractors for SB3 policies - :ref:`rl-environments` - UnifiedSimEnv for SB3 training - :ref:`rl-module` - Parent RL module documentation .. seealso:: - `Stable-Baselines3 Documentation `_ - `RLZoo3 Documentation `_ - `Gymnasium Documentation `_