Gymnasium Environments (Legacy)
Warning
Deprecated Module
This module contains the legacy SimEnv (GeneralSimEnv) which is
deprecated and will be removed in v6.X. For new work, use
RL Environments Module (UnifiedSimEnv) instead.
Migration Path:
# Old (deprecated)
from fusion.modules.rl.gymnasium_envs import SimEnv
env = SimEnv(sim_dict=config)
# New (recommended)
from fusion.modules.rl.environments import UnifiedSimEnv
env = UnifiedSimEnv(config=rl_config)
# Or use the factory function
from fusion.modules.rl.gymnasium_envs import create_sim_env
env = create_sim_env(config, env_type="unified")
At a Glance
- Purpose:
Legacy Gymnasium environment for RL simulation
- Location:
fusion/modules/rl/gymnasium_envs/- Key Classes:
SimEnv,create_sim_env()- Status:
Deprecated - use RL Environments Module instead
Warning
Spectral Band Limitation:
This module currently only supports C-band spectrum allocation. L-band and multi-band scenarios are not yet supported.
Overview
The gymnasium_envs module provides the original Gymnasium-compatible
environment implementation for reinforcement learning with FUSION network
simulations. It wraps the FUSION simulation engine in a standard RL interface.
Module Contents:
SimEnv(aliasGeneralSimEnv): Legacy environment classcreate_sim_env(): Factory function for environment creation with migration supportEnvType: Environment type constants for factory functionConstants for configuration and spectral bands
Factory Function
The create_sim_env() factory function provides a migration path from
the legacy SimEnv to the new UnifiedSimEnv.
from fusion.modules.rl.gymnasium_envs import create_sim_env, EnvType
# Create legacy environment (default)
env = create_sim_env(config)
# Create unified environment (recommended)
env = create_sim_env(config, env_type="unified")
# Or use EnvType constants
env = create_sim_env(config, env_type=EnvType.UNIFIED)
Environment Selection
The factory function determines which environment to create based on:
Explicit parameter:
env_type="legacy"orenv_type="unified"Environment variable:
RL_ENV_TYPE=unifiedEnvironment variable:
USE_UNIFIED_ENV=1Default: Legacy (for backward compatibility)
# Via environment variable
export USE_UNIFIED_ENV=1
python train.py # Will use UnifiedSimEnv
Function Reference
def create_sim_env(
config: dict[str, Any] | SimulationConfig,
env_type: str | None = None,
wrap_action_mask: bool = True,
**kwargs: Any,
) -> gym.Env:
"""
Create RL simulation environment.
:param config: Simulation configuration dict or SimulationConfig
:param env_type: "legacy" or "unified" (None checks env vars)
:param wrap_action_mask: Wrap unified env with ActionMaskWrapper
:param kwargs: Additional arguments for environment constructor
:return: Gymnasium environment instance
"""
Legacy SimEnv Usage
Deprecated since version 4.0: Use UnifiedSimEnv instead. SimEnv will be removed in v6.X.
Basic Setup
import os
# Suppress deprecation warning if needed
os.environ["SUPPRESS_SIMENV_DEPRECATION"] = "1"
from fusion.modules.rl.gymnasium_envs import SimEnv
# Create with configuration
sim_config = {
"s1": {
"path_algorithm": "q_learning",
"k_paths": 3,
"cores_per_link": 7,
"c_band": 320,
"erlang_start": 100,
"erlang_stop": 500,
"erlang_step": 50,
}
}
env = SimEnv(sim_dict=sim_config)
Training Loop
# Reset environment
obs, info = env.reset(seed=42)
for step in range(max_steps):
# Select action
action = env.action_space.sample()
# Take step
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
Integration with Stable-Baselines3
from stable_baselines3 import PPO
from fusion.modules.rl.gymnasium_envs import SimEnv
env = SimEnv(sim_dict=config)
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
SimEnv Configuration
Required Configuration Keys
The sim_dict must contain an "s1" key with simulation parameters:
Parameter |
Type |
Description |
|---|---|---|
|
str |
RL algorithm (q_learning, dqn, ppo, etc.) |
|
int |
Number of candidate paths per request |
|
int |
Fiber cores per network link |
|
int |
Spectral slots in C-band (only C-band supported) |
|
float |
Starting traffic load (Erlang) |
|
float |
Ending traffic load |
|
float |
Traffic load increment |
Optional Parameters
Parameter |
Default |
Description |
|---|---|---|
|
True |
Training mode (vs inference) |
|
False |
Enable Optuna optimization |
|
1.0 |
Reward for successful allocation |
|
-10.0 |
Penalty for blocked request |
Observation and Action Spaces
Observation Space
SimEnv provides graph-structured observations including:
Network topology information (node features, edge connectivity)
Current request details (bandwidth, holding time)
Available resources and path options
Congestion and feasibility indicators
The exact observation space depends on the configuration and is constructed
by get_obs_space() in utils/deep_rl.py.
Action Space
Actions represent path selection decisions:
Discrete(k_paths): Select which candidate path to useInvalid actions result in blocked requests and penalties
Reward Structure
Outcome |
Reward |
Description |
|---|---|---|
Success |
|
Request allocated successfully |
Blocked |
|
Request could not be allocated (negative value) |
Environment Lifecycle
__init__(sim_dict)
|
+---> Setup RLProps, helpers, agents
|---> Initial reset() to configure spaces
|---> Build observation_space, action_space
|
v
reset(seed, options)
|
+---> Initialize iteration
|---> Setup simulation engine
|---> Generate requests (Poisson arrivals)
|---> Return initial observation
|
v
step(action) [repeated]
|
+---> Process action (path selection)
|---> Attempt allocation via rl_help_obj
|---> Calculate reward
|---> Advance to next request
|---> Check termination
|---> Return (obs, reward, terminated, truncated, info)
|
v
[Episode ends when all requests processed]
Internal Components
SimEnv uses several helper classes for its operation:
Component |
Purpose |
|---|---|
|
State container for RL properties (paths, slots, etc.) |
|
Simulation setup and model loading |
|
Observation construction |
|
Step handling and termination checks |
|
Allocation and resource management |
|
Path selection agent (Q-learning, bandits, etc.) |
Constants Reference
Defined in constants.py:
# Configuration keys
DEFAULT_SIMULATION_KEY = "s1"
DEFAULT_SAVE_SIMULATION = False
# Spectral bands (C-band only currently)
SUPPORTED_SPECTRAL_BANDS = ["c"]
# Arrival parameter keys
ARRIVAL_DICT_KEYS = {
"start": "erlang_start",
"stop": "erlang_stop",
"step": "erlang_step",
}
# Environment defaults
DEFAULT_ITERATION = 0
DEFAULT_ARRIVAL_COUNT = 0
Migration to UnifiedSimEnv
Why Migrate?
Accuracy: UnifiedSimEnv uses the same code paths as non-RL simulation
Maintainability: Single unified codebase (no forked logic)
Features: Better action masking, GNN observations, configurable obs spaces
Testing: More comprehensive test coverage
Migration Steps
Update imports:
# Before from fusion.modules.rl.gymnasium_envs import SimEnv # After from fusion.modules.rl.environments import UnifiedSimEnv
Update configuration:
# Before (dict with "s1" key) config = {"s1": {"path_algorithm": "dqn", ...}} env = SimEnv(sim_dict=config) # After (RLConfig object) from fusion.modules.rl.adapter import RLConfig config = RLConfig(k_paths=3, obs_space="obs_8") env = UnifiedSimEnv(config=config)
Update action handling:
# Before (no action masking) action = model.predict(obs) # After (with action masking) action_mask = info["action_mask"] action = model.predict(obs, action_masks=action_mask)
Test thoroughly before removing legacy code
File Reference
fusion/modules/rl/gymnasium_envs/
|-- __init__.py # Factory function, exports
|-- general_sim_env.py # SimEnv class (deprecated)
|-- constants.py # Configuration constants
|-- README.md # Module documentation
|-- TODO.md # Development roadmap
`-- tests/
`-- ... # Unit tests
Public API:
from fusion.modules.rl.gymnasium_envs import (
# Factory function (recommended)
create_sim_env,
EnvType,
# Legacy environment (deprecated)
SimEnv,
# Constants
DEFAULT_SIMULATION_KEY,
DEFAULT_SAVE_SIMULATION,
SUPPORTED_SPECTRAL_BANDS,
ARRIVAL_DICT_KEYS,
DEFAULT_ITERATION,
DEFAULT_ARRIVAL_COUNT,
)