Stable-Baselines3 Integration (sb3)
At a Glance
- Purpose:
Environment registration and RLZoo3 integration for SB3 training
- Location:
fusion/modules/rl/sb3/- Key Functions:
copy_yml_file(),main()- External Docs:
Overview
The sb3 module provides utilities for integrating FUSION environments with the Stable-Baselines3 (SB3) reinforcement learning framework. It handles:
Environment Registration: Register custom environments with Gymnasium
RLZoo3 Integration: Deploy hyperparameter configurations for automated training
Why This Module?
Stable-Baselines3 and RLZoo3 provide a powerful ecosystem for:
Standardized RL algorithm implementations (PPO, DQN, A2C, etc.)
Hyperparameter optimization with Optuna integration
Experiment tracking and reproducibility
Pre-tuned configurations for common environments
This module bridges FUSION’s custom environments with this ecosystem.
RLZoo3 Integration
FUSION supports automatic integration with RLZoo3, a training framework built on Stable-Baselines3 that provides:
Hyperparameter Optimization: Automated tuning with Optuna
Experiment Management: Organized logging and model saving
Benchmarking: Standardized evaluation protocols
Pre-tuned Configs: Curated hyperparameters for various environments
Training Workflow
+------------------+ +------------------+ +------------------+
| 1. Register Env |---->| 2. Deploy Config |---->| 3. Train with |
| with Gymnasium| | to RLZoo3 | | RLZoo3 |
+------------------+ +------------------+ +------------------+
Step 1: Register Environment
python -m fusion.modules.rl.sb3.register_env --algo PPO --env-name SimEnv
Step 2: Train with RLZoo3
# Basic training
python -m rl_zoo3.train --algo ppo --env SimEnv
# With hyperparameter optimization
python -m rl_zoo3.train --algo ppo --env SimEnv -optimize --n-trials 100
# With custom config
python -m rl_zoo3.train --algo ppo --env SimEnv --conf-file custom_ppo.yml
Step 3: Evaluate
python -m rl_zoo3.enjoy --algo ppo --env SimEnv --folder logs/
Environment Registration
The main() function registers FUSION environments with Gymnasium’s registry:
from gymnasium.envs.registration import register
# Register custom environment
register(
id="SimEnv",
entry_point="reinforcement_learning.gymnasium_envs.general_sim_env:SimEnv"
)
Command-Line Interface
python -m fusion.modules.rl.sb3.register_env --algo ALGO --env-name ENV
Arguments:
Argument |
Required |
Description |
|---|---|---|
|
Yes |
Algorithm name (PPO, DQN, A2C, etc.) |
|
Yes |
Environment class name to register |
Example:
# Register SimEnv with PPO configuration
python -m fusion.modules.rl.sb3.register_env --algo PPO --env-name SimEnv
# Register with DQN
python -m fusion.modules.rl.sb3.register_env --algo DQN --env-name SimEnv
Configuration Management
copy_yml_file
Deploys algorithm configuration files to RLZoo3’s hyperparameters directory:
from fusion.modules.rl.sb3 import copy_yml_file
# Copy PPO configuration to RLZoo3
copy_yml_file("PPO")
File Paths:
Source: sb3_scripts/yml/{algorithm}.yml
Destination: venvs/.../site-packages/rl_zoo3/hyperparams/{algorithm}.yml
Configuration File Format
Algorithm configurations use YAML format compatible with RLZoo3:
PPO Example:
SimEnv:
policy: 'MlpPolicy'
n_timesteps: !!float 2e6
learning_rate: lin_3e-4
n_steps: 2048
batch_size: 64
n_epochs: 10
gamma: 0.99
gae_lambda: 0.95
clip_range: 0.2
ent_coef: 0.0
vf_coef: 0.5
max_grad_norm: 0.5
DQN Example:
SimEnv:
policy: 'MlpPolicy'
n_timesteps: !!float 1e6
buffer_size: 1000000
learning_rate: !!float 1e-4
learning_starts: 50000
batch_size: 32
tau: 1.0
gamma: 0.99
train_freq: 4
gradient_steps: 1
target_update_interval: 10000
Key Parameters:
Parameter |
Description |
|---|---|
|
Policy architecture (MlpPolicy, MultiInputPolicy, CnnPolicy) |
|
Total training timesteps |
|
Learning rate (can use schedules like |
|
Minibatch size for updates |
|
Discount factor |
Directory Structure
project/
|-- sb3_scripts/yml/ # Source configuration files
| |-- PPO.yml
| |-- DQN.yml
| |-- A2C.yml
| `-- ...
|
|-- fusion/modules/rl/sb3/ # This module
| |-- __init__.py
| |-- register_env.py
| `-- README.md
|
`-- logs/ # RLZoo3 training outputs
`-- ppo/
`-- SimEnv_1/
|-- model.zip # Trained model
|-- config.yml # Training config
`-- evaluations/ # Evaluation results
Using with FUSION Environments
Standard Training Pipeline
from stable_baselines3 import PPO
from fusion.modules.rl.gymnasium_envs import create_sim_env
# Create FUSION environment
config = {"k_paths": 3, "spectral_slots": 320}
env = create_sim_env(config, env_type="unified")
# Train with SB3 directly
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=100_000)
# Save model
model.save("ppo_fusion")
With RLZoo3 Hyperparameter Optimization
# 1. Register environment
python -m fusion.modules.rl.sb3.register_env --algo PPO --env-name SimEnv
# 2. Run hyperparameter optimization
python -m rl_zoo3.train \
--algo ppo \
--env SimEnv \
-optimize \
--n-trials 100 \
--sampler tpe \
--pruner median
# 3. Train with best hyperparameters
python -m rl_zoo3.train \
--algo ppo \
--env SimEnv \
--conf-file logs/ppo/SimEnv_1/best_model/config.yml
With Custom Feature Extractors
from stable_baselines3 import PPO
from fusion.modules.rl.feat_extrs import PathGNN
from fusion.modules.rl.gymnasium_envs import create_sim_env
env = create_sim_env(config, env_type="unified")
model = PPO(
"MultiInputPolicy",
env,
policy_kwargs={
"features_extractor_class": PathGNN,
"features_extractor_kwargs": {
"emb_dim": 64,
"gnn_type": "gat",
"layers": 2,
}
},
verbose=1,
)
Error Handling
The module provides clear error messages for common issues:
FileNotFoundError:
Configuration file not found: sb3_scripts/yml/PPO.yml.
Ensure the algorithm configuration exists in sb3_scripts/yml/
PermissionError:
Cannot write to RLZoo3 directory: .../rl_zoo3/hyperparams/PPO.yml.
Check file permissions and virtual environment access.
File Reference
fusion/modules/rl/sb3/
|-- __init__.py # Public exports
|-- register_env.py # Registration utilities
`-- README.md # Module documentation
Public API:
from fusion.modules.rl.sb3 import (
copy_yml_file, # Deploy config to RLZoo3
main, # CLI entry point
)