.. _rl-agents: ================= RL Agents Package ================= .. admonition:: At a Glance :class: tip :Purpose: RL agents that select actions during simulation :Location: ``fusion/modules/rl/agents/`` :Key Files: ``base_agent.py``, ``path_agent.py`` :Prerequisites: Understanding of RL algorithms (Q-learning, bandits, DRL) .. warning:: **Legacy Path Only** This agents package is used by the **legacy simulation path** (``GeneralSimEnv``, ``SDNController``). If you're using the new orchestrator path with ``UnifiedSimEnv``, you don't use these agents directly - the environment handles action selection through the :ref:`rl-adapter`. - **Legacy path**: Uses ``PathAgent`` directly - **Orchestrator path**: Uses ``RLSimulationAdapter`` + SB3 models What Are Agents? ================ In FUSION's RL module, an **agent** is an object that: 1. **Holds an algorithm** (Q-learning, bandit, PPO, etc.) 2. **Selects actions** based on the current state 3. **Updates** the algorithm based on rewards 4. **Manages hyperparameters** (learning rate, epsilon decay) Think of agents as the "brain" that wraps an algorithm and provides a consistent interface for the simulation to interact with. .. code-block:: text +------------------+ +------------------+ +------------------+ | Simulation |---->| Agent |---->| Algorithm | | (requests state) | | (coordinates) | | (does the math) | +------------------+ +------------------+ +------------------+ | | manages v +------------------+ | Hyperparameters | | (alpha, epsilon) | +------------------+ Current Implementation Status ----------------------------- .. list-table:: :header-rows: 1 :widths: 20 20 60 * - Agent - Status - Description * - ``PathAgent`` - **Implemented** - Selects which path to use for a request * - ``CoreAgent`` - Placeholder - Will select which fiber core to use (multi-core fibers) * - ``SpectrumAgent`` - Placeholder - Will select spectrum slots (currently uses heuristics) Quick Start: Using PathAgent ============================ This tutorial shows how to use ``PathAgent`` with the legacy simulation path. Step 1: Create the Agent ------------------------ .. code-block:: python from fusion.modules.rl.agents import PathAgent # Create agent with your chosen algorithm path_agent = PathAgent( path_algorithm="q_learning", # or "epsilon_greedy_bandit", "ppo", etc. rl_props=rl_props, # RL properties object rl_help_obj=rl_helper, # RL helper for utilities ) **Available algorithms:** - ``q_learning`` - Tabular Q-learning (good for small state spaces) - ``epsilon_greedy_bandit`` - Multi-armed bandit with epsilon-greedy - ``ucb_bandit`` - Upper Confidence Bound bandit - ``ppo`` - Proximal Policy Optimization (deep RL) - ``a2c`` - Advantage Actor-Critic (deep RL) - ``dqn`` - Deep Q-Network (deep RL) - ``qr_dqn`` - Quantile Regression DQN (deep RL) Step 2: Initialize the Environment ---------------------------------- Before using the agent, set up its environment: .. code-block:: python # Set engine properties (simulation configuration) path_agent.engine_props = { "max_iters": 1000, "k_paths": 3, "reward": 1.0, "penalty": -1.0, "gamma": 0.9, "path_algorithm": "q_learning", # ... other properties } # Initialize the algorithm and hyperparameters path_agent.setup_env(is_path=True) **What happens in setup_env:** 1. Creates the reward tracking array 2. Initializes hyperparameter configuration 3. Creates the algorithm object (Q-learning, bandit, etc.) Step 3: Select a Route ---------------------- During simulation, ask the agent to select a route: .. code-block:: python # For Q-learning path_agent.get_route() # For bandits path_agent.get_route(route_obj=route_object) # For deep RL (PPO, DQN, etc.) path_agent.get_route(route_obj=route_object, action=selected_action) # After get_route(), these are populated: chosen_path = path_agent.rl_props.chosen_path_list chosen_index = path_agent.rl_props.chosen_path_index **How route selection works:** - **Q-learning**: Uses epsilon-greedy over Q-values - **Bandits**: Uses bandit-specific selection (epsilon-greedy or UCB) - **Deep RL**: Uses the action provided by the SB3 model Step 4: Update After Allocation ------------------------------- After the simulation tries to allocate the request, update the agent: .. code-block:: python path_agent.update( was_allocated=True, # Did allocation succeed? network_spectrum_dict=spectrum_db, # Current spectrum state iteration=current_iter, # Current iteration number path_length=len(chosen_path), # Length of selected path trial=current_trial, # Current trial number ) **What happens in update:** 1. Calculates reward based on allocation success 2. Updates the algorithm (Q-values, bandit estimates, etc.) 3. Updates hyperparameters if using per-step decay Step 5: End Iteration --------------------- At the end of each iteration (episode), call: .. code-block:: python path_agent.end_iter() This updates episodic hyperparameters (alpha, epsilon decay). Understanding the Class Hierarchy ================================= .. code-block:: text BaseAgent |-- PathAgent (implemented) |-- CoreAgent (placeholder) `-- SpectrumAgent (placeholder) BaseAgent --------- The base class provides common functionality: .. code-block:: python class BaseAgent: def __init__(self, algorithm, rl_props, rl_help_obj): self.algorithm = algorithm # Algorithm name string self.rl_props = rl_props # RL properties self.rl_help_obj = rl_help_obj # Helper utilities self.algorithm_obj = None # The actual algorithm instance self.engine_props = None # Simulation configuration def setup_env(self, is_path: bool): """Initialize the algorithm based on self.algorithm""" def get_reward(self, was_allocated, dynamic, core_index, req_id): """Calculate reward/penalty for an action""" def load_model(self, model_path, file_prefix, **kwargs): """Load a trained model from disk""" PathAgent --------- Extends BaseAgent with path-specific functionality: .. code-block:: python class PathAgent(BaseAgent): def __init__(self, path_algorithm, rl_props, rl_help_obj): super().__init__(path_algorithm, rl_props, rl_help_obj) self.iteration = None self.level_index = None # For Q-learning congestion levels self.congestion_list = None # Path congestion data self.state_action_pair = None # (source, dest) tuple self.action_index = None # Selected path index def get_route(self, **kwargs): """Select a route using the configured algorithm""" def update(self, was_allocated, network_spectrum_dict, ...): """Update agent after allocation attempt""" def end_iter(self): """End iteration and update episodic hyperparameters""" Reward Calculation ================== The agent calculates rewards based on allocation success: Static Rewards -------------- Simple success/failure rewards from configuration: .. code-block:: python # In engine_props: # reward = 1.0, penalty = -1.0 reward = agent.get_reward( was_allocated=True, dynamic=False, core_index=None, req_id=None, ) # Returns: 1.0 (success) or -1.0 (failure) Dynamic Rewards --------------- Rewards that vary based on context: .. code-block:: python reward = agent.get_reward( was_allocated=True, dynamic=True, core_index=2, # Which core was used req_id=150, # Which request number ) # Returns: reward adjusted by core index and request progress **Dynamic reward formula:** .. code-block:: python # For success: decay_factor = 1 + decay_factor * core_index core_decay = reward / decay_factor request_ratio = (num_requests - req_id) / num_requests request_weight = request_ratio ** core_beta dynamic_reward = core_decay * request_weight # For failure: penalty_factor = 1 + gamma * core_index / req_id dynamic_penalty = penalty * penalty_factor Hyperparameter Management ========================= Agents manage hyperparameters through ``HyperparamConfig``: .. code-block:: python # Hyperparameters are automatically managed # Access current values: current_alpha = agent.hyperparam_obj.current_alpha # Learning rate current_epsilon = agent.hyperparam_obj.current_epsilon # Exploration rate Decay Strategies ---------------- Hyperparameters can decay over time: **Episodic decay** - Updates at end of each iteration: .. code-block:: python # In end_iter(): if alpha_strategy in EPISODIC_STRATEGIES: hyperparam_obj.update_alpha() if epsilon_strategy in EPISODIC_STRATEGIES: hyperparam_obj.update_eps() **Per-step decay** - Updates after each action: .. code-block:: python # In update(): if alpha_strategy not in EPISODIC_STRATEGIES: hyperparam_obj.update_alpha() Extending the Agents Module =========================== Tutorial: Implementing CoreAgent -------------------------------- Here's how you would implement ``CoreAgent`` (currently a placeholder): **Step 1: Define the class** .. code-block:: python # core_agent.py from typing import Any from fusion.modules.rl.agents.base_agent import BaseAgent from fusion.modules.rl.errors import InvalidActionError class CoreAgent(BaseAgent): """Agent for intelligent core assignment in multi-core fibers.""" def __init__( self, core_algorithm: str, rl_props: Any, rl_help_obj: Any, ) -> None: super().__init__(core_algorithm, rl_props, rl_help_obj) self.selected_core: int | None = None **Step 2: Add core selection method** .. code-block:: python def get_core(self, available_cores: list[int], **kwargs: Any) -> int: """ Select a core for the current request. :param available_cores: List of cores with available spectrum :return: Selected core index """ if not available_cores: raise InvalidActionError("No cores available for assignment") if self.algorithm == "q_learning": return self._ql_core_selection(available_cores) elif self.algorithm in ("epsilon_greedy_bandit", "ucb_bandit"): return self._bandit_core_selection(available_cores) elif self.algorithm in ("ppo", "a2c", "dqn", "qr_dqn"): return self._drl_core_selection(available_cores, kwargs["action"]) else: raise InvalidActionError(f"Algorithm '{self.algorithm}' not supported") **Step 3: Implement algorithm-specific selection** .. code-block:: python def _ql_core_selection(self, available_cores: list[int]) -> int: """Q-learning based core selection.""" assert self.hyperparam_obj is not None assert self.algorithm_obj is not None # Epsilon-greedy selection if np.random.random() < self.hyperparam_obj.current_epsilon: # Explore: random core self.selected_core = np.random.choice(available_cores) else: # Exploit: best Q-value core if hasattr(self.algorithm_obj, "get_best_core"): self.selected_core = self.algorithm_obj.get_best_core( available_cores=available_cores, source=self.rl_props.source, dest=self.rl_props.destination, ) else: self.selected_core = available_cores[0] return self.selected_core **Step 4: Add update method** .. code-block:: python def update( self, was_allocated: bool, iteration: int, trial: int, ) -> None: """Update agent after core assignment attempt.""" self._ensure_initialized() reward = self.get_reward( was_allocated=was_allocated, dynamic=self.engine_props["dynamic_reward"], core_index=self.selected_core, req_id=iteration, ) if self.algorithm == "q_learning": self.algorithm_obj.update_core_q_values( reward=reward, core=self.selected_core, source=self.rl_props.source, dest=self.rl_props.destination, ) elif self.algorithm in ("epsilon_greedy_bandit", "ucb_bandit"): self.algorithm_obj.update( reward=reward, arm=self.selected_core, iteration=iteration, trial=trial, ) **Step 5: Update __init__.py** .. code-block:: python # In agents/__init__.py from .core_agent import CoreAgent # Now imports real implementation Tutorial: Adding a New Algorithm to an Agent -------------------------------------------- To add support for a new algorithm (e.g., SARSA): **Step 1: Create the algorithm class** First, create ``fusion/modules/rl/algorithms/sarsa.py``: .. code-block:: python class SARSA: """SARSA (State-Action-Reward-State-Action) algorithm.""" def __init__(self, rl_props, engine_props): self.rl_props = rl_props self.engine_props = engine_props self.q_table = {} self.learn_rate = engine_props.get("alpha_start", 0.1) def select_action(self, state, epsilon): # Epsilon-greedy selection pass def update(self, state, action, reward, next_state, next_action): # SARSA update rule pass **Step 2: Add to BaseAgent.setup_env()** .. code-block:: python # In base_agent.py, setup_env(): elif self.algorithm == "sarsa": from fusion.modules.rl.algorithms.sarsa import SARSA self.algorithm_obj = SARSA( rl_props=self.rl_props, engine_props=self.engine_props, ) **Step 3: Add to PathAgent methods** .. code-block:: python # In path_agent.py, get_route(): elif self.algorithm == "sarsa": self._sarsa_route() # Add the new method: def _sarsa_route(self) -> None: """Select route using SARSA algorithm.""" state = (self.rl_props.source, self.rl_props.destination) self.rl_props.chosen_path_index = self.algorithm_obj.select_action( state=state, epsilon=self.hyperparam_obj.current_epsilon, ) self.rl_props.chosen_path_list = self.rl_props.paths_list[ self.rl_props.chosen_path_index ] **Step 4: Update the update() method** .. code-block:: python # In path_agent.py, update(): elif self.algorithm == "sarsa": self.algorithm_obj.update( state=self.state_action_pair, action=self.action_index, reward=reward, next_state=next_state, next_action=next_action, ) **Step 5: Add to valid algorithms list** .. code-block:: python # In fusion/modules/rl/args/general_args.py VALID_ALGORITHMS = [ "q_learning", "epsilon_greedy_bandit", "ucb_bandit", "sarsa", # Add new algorithm # ... ] Testing ======= Running Tests ------------- .. code-block:: bash # Run all agent tests (if they exist) pytest fusion/modules/rl/agents/tests/ -v # Run with coverage pytest fusion/modules/rl/agents/tests/ -v --cov=fusion.modules.rl.agents Writing Tests for Agents ------------------------ .. code-block:: python import pytest from unittest.mock import MagicMock from fusion.modules.rl.agents import PathAgent @pytest.fixture def mock_rl_props(): """Create mock RL properties.""" props = MagicMock() props.source = 0 props.destination = 5 props.k_paths = 3 props.paths_list = [["0", "1", "5"], ["0", "2", "5"], ["0", "3", "4", "5"]] props.chosen_path_index = None props.chosen_path_list = None return props @pytest.fixture def path_agent(mock_rl_props): """Create PathAgent for testing.""" agent = PathAgent( path_algorithm="epsilon_greedy_bandit", rl_props=mock_rl_props, rl_help_obj=MagicMock(), ) agent.engine_props = { "max_iters": 100, "k_paths": 3, "reward": 1.0, "penalty": -1.0, "gamma": 0.9, "path_algorithm": "epsilon_greedy_bandit", } return agent def test_setup_env_creates_algorithm(path_agent): """setup_env should create the algorithm object.""" path_agent.setup_env(is_path=True) assert path_agent.algorithm_obj is not None assert path_agent.hyperparam_obj is not None assert path_agent.reward_penalty_list is not None def test_get_reward_returns_correct_values(path_agent): """get_reward should return configured reward/penalty.""" path_agent.setup_env(is_path=True) success_reward = path_agent.get_reward( was_allocated=True, dynamic=False, core_index=None, req_id=None ) failure_penalty = path_agent.get_reward( was_allocated=False, dynamic=False, core_index=None, req_id=None ) assert success_reward == 1.0 assert failure_penalty == -1.0 Common Issues ============= **"engine_props must be set before calling setup_env"** .. code-block:: python # Wrong: agent = PathAgent(...) agent.setup_env(is_path=True) # Error! # Right: agent = PathAgent(...) agent.engine_props = {...} # Set this first agent.setup_env(is_path=True) **"Algorithm 'xyz' is not supported"** Check that your algorithm is in the supported list and spelled correctly: - ``q_learning`` (not ``qlearning`` or ``Q_learning``) - ``epsilon_greedy_bandit`` (not ``epsilon_greedy``) - ``ppo``, ``a2c``, ``dqn``, ``qr_dqn`` (lowercase) **"algorithm_obj must be initialized"** Always call ``setup_env()`` before using the agent: .. code-block:: python agent.setup_env(is_path=True) # This creates algorithm_obj File Reference ============== .. code-block:: text fusion/modules/rl/agents/ |-- __init__.py # Public exports |-- base_agent.py # BaseAgent class |-- path_agent.py # PathAgent (implemented) |-- core_agent.py # CoreAgent (placeholder) |-- spectrum_agent.py # SpectrumAgent (placeholder) `-- README.md # Module documentation **What to import:** .. code-block:: python # Main agent class from fusion.modules.rl.agents import PathAgent # Base class (for extending) from fusion.modules.rl.agents import BaseAgent # Placeholders (will raise NotImplementedError) from fusion.modules.rl.agents import CoreAgent, SpectrumAgent Related Documentation ===================== - :ref:`rl-module` - Parent RL module documentation - :ref:`rl-adapter` - Adapter for orchestrator path (alternative to agents) - ``fusion/modules/rl/algorithms/`` - Algorithm implementations - ``fusion/modules/rl/utils/hyperparams.py`` - Hyperparameter configuration