.. _rl-gymnasium-envs: ================================ Gymnasium Environments (Legacy) ================================ .. warning:: **Deprecated Module** This module contains the legacy ``SimEnv`` (``GeneralSimEnv``) which is **deprecated** and will be removed in v6.X. For new work, use :ref:`rl-environments` (``UnifiedSimEnv``) instead. **Migration Path:** .. code-block:: python # Old (deprecated) from fusion.modules.rl.gymnasium_envs import SimEnv env = SimEnv(sim_dict=config) # New (recommended) from fusion.modules.rl.environments import UnifiedSimEnv env = UnifiedSimEnv(config=rl_config) # Or use the factory function from fusion.modules.rl.gymnasium_envs import create_sim_env env = create_sim_env(config, env_type="unified") .. admonition:: At a Glance :class: tip :Purpose: Legacy Gymnasium environment for RL simulation :Location: ``fusion/modules/rl/gymnasium_envs/`` :Key Classes: ``SimEnv``, ``create_sim_env()`` :Status: Deprecated - use :ref:`rl-environments` instead .. warning:: **Spectral Band Limitation:** This module currently only supports **C-band** spectrum allocation. L-band and multi-band scenarios are not yet supported. Overview ======== The ``gymnasium_envs`` module provides the original Gymnasium-compatible environment implementation for reinforcement learning with FUSION network simulations. It wraps the FUSION simulation engine in a standard RL interface. **Module Contents:** - ``SimEnv`` (alias ``GeneralSimEnv``): Legacy environment class - ``create_sim_env()``: Factory function for environment creation with migration support - ``EnvType``: Environment type constants for factory function - Constants for configuration and spectral bands Factory Function ================ The ``create_sim_env()`` factory function provides a migration path from the legacy ``SimEnv`` to the new ``UnifiedSimEnv``. .. code-block:: python from fusion.modules.rl.gymnasium_envs import create_sim_env, EnvType # Create legacy environment (default) env = create_sim_env(config) # Create unified environment (recommended) env = create_sim_env(config, env_type="unified") # Or use EnvType constants env = create_sim_env(config, env_type=EnvType.UNIFIED) Environment Selection --------------------- The factory function determines which environment to create based on: 1. **Explicit parameter**: ``env_type="legacy"`` or ``env_type="unified"`` 2. **Environment variable**: ``RL_ENV_TYPE=unified`` 3. **Environment variable**: ``USE_UNIFIED_ENV=1`` 4. **Default**: Legacy (for backward compatibility) .. code-block:: bash # Via environment variable export USE_UNIFIED_ENV=1 python train.py # Will use UnifiedSimEnv Function Reference ------------------ .. code-block:: python def create_sim_env( config: dict[str, Any] | SimulationConfig, env_type: str | None = None, wrap_action_mask: bool = True, **kwargs: Any, ) -> gym.Env: """ Create RL simulation environment. :param config: Simulation configuration dict or SimulationConfig :param env_type: "legacy" or "unified" (None checks env vars) :param wrap_action_mask: Wrap unified env with ActionMaskWrapper :param kwargs: Additional arguments for environment constructor :return: Gymnasium environment instance """ Legacy SimEnv Usage =================== .. deprecated:: 4.0 Use ``UnifiedSimEnv`` instead. ``SimEnv`` will be removed in v6.X. Basic Setup ----------- .. code-block:: python import os # Suppress deprecation warning if needed os.environ["SUPPRESS_SIMENV_DEPRECATION"] = "1" from fusion.modules.rl.gymnasium_envs import SimEnv # Create with configuration sim_config = { "s1": { "path_algorithm": "q_learning", "k_paths": 3, "cores_per_link": 7, "c_band": 320, "erlang_start": 100, "erlang_stop": 500, "erlang_step": 50, } } env = SimEnv(sim_dict=sim_config) Training Loop ------------- .. code-block:: python # Reset environment obs, info = env.reset(seed=42) for step in range(max_steps): # Select action action = env.action_space.sample() # Take step obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: obs, info = env.reset() Integration with Stable-Baselines3 ---------------------------------- .. code-block:: python from stable_baselines3 import PPO from fusion.modules.rl.gymnasium_envs import SimEnv env = SimEnv(sim_dict=config) model = PPO("MultiInputPolicy", env, verbose=1) model.learn(total_timesteps=10000) SimEnv Configuration ==================== Required Configuration Keys --------------------------- The ``sim_dict`` must contain an ``"s1"`` key with simulation parameters: .. list-table:: :header-rows: 1 :widths: 25 15 60 * - Parameter - Type - Description * - ``path_algorithm`` - str - RL algorithm (q_learning, dqn, ppo, etc.) * - ``k_paths`` - int - Number of candidate paths per request * - ``cores_per_link`` - int - Fiber cores per network link * - ``c_band`` - int - Spectral slots in C-band (only C-band supported) * - ``erlang_start`` - float - Starting traffic load (Erlang) * - ``erlang_stop`` - float - Ending traffic load * - ``erlang_step`` - float - Traffic load increment Optional Parameters ------------------- .. list-table:: :header-rows: 1 :widths: 25 15 60 * - Parameter - Default - Description * - ``is_training`` - True - Training mode (vs inference) * - ``optimize`` - False - Enable Optuna optimization * - ``reward`` - 1.0 - Reward for successful allocation * - ``penalty`` - -10.0 - Penalty for blocked request Observation and Action Spaces ============================= Observation Space ----------------- SimEnv provides graph-structured observations including: - Network topology information (node features, edge connectivity) - Current request details (bandwidth, holding time) - Available resources and path options - Congestion and feasibility indicators The exact observation space depends on the configuration and is constructed by ``get_obs_space()`` in ``utils/deep_rl.py``. Action Space ------------ Actions represent path selection decisions: - ``Discrete(k_paths)``: Select which candidate path to use - Invalid actions result in blocked requests and penalties Reward Structure ---------------- .. list-table:: :header-rows: 1 :widths: 20 20 60 * - Outcome - Reward - Description * - Success - ``+reward`` - Request allocated successfully * - Blocked - ``penalty`` - Request could not be allocated (negative value) Environment Lifecycle ===================== .. code-block:: text __init__(sim_dict) | +---> Setup RLProps, helpers, agents |---> Initial reset() to configure spaces |---> Build observation_space, action_space | v reset(seed, options) | +---> Initialize iteration |---> Setup simulation engine |---> Generate requests (Poisson arrivals) |---> Return initial observation | v step(action) [repeated] | +---> Process action (path selection) |---> Attempt allocation via rl_help_obj |---> Calculate reward |---> Advance to next request |---> Check termination |---> Return (obs, reward, terminated, truncated, info) | v [Episode ends when all requests processed] Internal Components =================== SimEnv uses several helper classes for its operation: .. list-table:: :header-rows: 1 :widths: 25 75 * - Component - Purpose * - ``RLProps`` - State container for RL properties (paths, slots, etc.) * - ``SetupHelper`` - Simulation setup and model loading * - ``SimEnvObs`` - Observation construction * - ``SimEnvUtils`` - Step handling and termination checks * - ``CoreUtilHelpers`` - Allocation and resource management * - ``PathAgent`` - Path selection agent (Q-learning, bandits, etc.) Constants Reference =================== Defined in ``constants.py``: .. code-block:: python # Configuration keys DEFAULT_SIMULATION_KEY = "s1" DEFAULT_SAVE_SIMULATION = False # Spectral bands (C-band only currently) SUPPORTED_SPECTRAL_BANDS = ["c"] # Arrival parameter keys ARRIVAL_DICT_KEYS = { "start": "erlang_start", "stop": "erlang_stop", "step": "erlang_step", } # Environment defaults DEFAULT_ITERATION = 0 DEFAULT_ARRIVAL_COUNT = 0 Migration to UnifiedSimEnv ========================== Why Migrate? ------------ - **Accuracy**: UnifiedSimEnv uses the same code paths as non-RL simulation - **Maintainability**: Single unified codebase (no forked logic) - **Features**: Better action masking, GNN observations, configurable obs spaces - **Testing**: More comprehensive test coverage Migration Steps --------------- 1. **Update imports:** .. code-block:: python # Before from fusion.modules.rl.gymnasium_envs import SimEnv # After from fusion.modules.rl.environments import UnifiedSimEnv 2. **Update configuration:** .. code-block:: python # Before (dict with "s1" key) config = {"s1": {"path_algorithm": "dqn", ...}} env = SimEnv(sim_dict=config) # After (RLConfig object) from fusion.modules.rl.adapter import RLConfig config = RLConfig(k_paths=3, obs_space="obs_8") env = UnifiedSimEnv(config=config) 3. **Update action handling:** .. code-block:: python # Before (no action masking) action = model.predict(obs) # After (with action masking) action_mask = info["action_mask"] action = model.predict(obs, action_masks=action_mask) 4. **Test thoroughly** before removing legacy code File Reference ============== .. code-block:: text fusion/modules/rl/gymnasium_envs/ |-- __init__.py # Factory function, exports |-- general_sim_env.py # SimEnv class (deprecated) |-- constants.py # Configuration constants |-- README.md # Module documentation |-- TODO.md # Development roadmap `-- tests/ `-- ... # Unit tests **Public API:** .. code-block:: python from fusion.modules.rl.gymnasium_envs import ( # Factory function (recommended) create_sim_env, EnvType, # Legacy environment (deprecated) SimEnv, # Constants DEFAULT_SIMULATION_KEY, DEFAULT_SAVE_SIMULATION, SUPPORTED_SPECTRAL_BANDS, ARRIVAL_DICT_KEYS, DEFAULT_ITERATION, DEFAULT_ARRIVAL_COUNT, ) Related Documentation ===================== - :ref:`rl-environments` - UnifiedSimEnv (recommended replacement) - :ref:`rl-adapter` - RLSimulationAdapter and RLConfig - :ref:`rl-algorithms` - RL algorithms for training - :ref:`rl-module` - Parent RL module documentation .. seealso:: - `Gymnasium Documentation `_ - `Stable-Baselines3 Documentation `_