.. _policies-module:

===============
Policies Module
===============

.. tip::

   **Lost in the architecture?** This page explains where policies fit.
   Start with :ref:`the-big-picture` below before diving into details.

Overview
========

.. admonition:: At a Glance
   :class: tip

   :Purpose: Decision-making for path selection ("which path should I use?")
   :Location: ``fusion/policies/``
   :Key Files: ``heuristic_policy.py``, ``ml_policy.py``, ``rl_policy.py``, ``policy_factory.py``
   :Used By: ``SDNOrchestrator`` (new architecture)
   :Protocol: Implements ``ControlPolicy`` from ``fusion/interfaces/control_policy.py``

**What this module does:**

- Chooses which path to use when serving a network request
- Provides pluggable strategies: heuristics, ML models, RL models
- Decouples "how to decide" from "how to execute"

**What this module does NOT do:**

- Find paths (that's ``fusion/modules/routing/`` or ``fusion/pipelines/``)
- Assign spectrum (that's ``fusion/modules/spectrum/``)
- Run the simulation (that's ``fusion/core/``)

.. _the-big-picture:

The Big Picture: Where Policies Fit
===================================

.. important::

   **The core confusion:** FUSION has many components with similar-sounding names.
   This section explains what each does and how they connect.

The Request Lifecycle
---------------------

When a network request arrives, here's what happens:

.. code-block:: text

   +==========================================================================+
   |                    REQUEST LIFECYCLE (Orchestrator)                       |
   +==========================================================================+
   |                                                                           |
   |   1. REQUEST ARRIVES                                                      |
   |          |                                                                |
   |          v                                                                |
   |   2. FIND CANDIDATE PATHS                                                 |
   |      +------------------+                                                 |
   |      | RoutingPipeline  |  "Here are 5 possible paths from A to Z"        |
   |      +--------+---------+                                                 |
   |               |                                                           |
   |               v                                                           |
   |   3. CHECK FEASIBILITY                                                    |
   |      +------------------+                                                 |
   |      | SpectrumPipeline |  "Paths 1, 3, 5 have available spectrum"        |
   |      | SNRPipeline      |  "Paths 1, 5 meet SNR requirements"             |
   |      +--------+---------+                                                 |
   |               |                                                           |
   |               v                                                           |
   |   4. SELECT PATH  <-- THIS IS WHERE POLICIES COME IN                      |
   |      +------------------+                                                 |
   |      | ControlPolicy    |  "Use path 5 (it's the least congested)"        |
   |      +--------+---------+                                                 |
   |               |                                                           |
   |               v                                                           |
   |   5. ALLOCATE RESOURCES                                                   |
   |      +------------------+                                                 |
   |      | SpectrumPipeline |  "Reserved slots 10-15 on path 5"               |
   |      +--------+---------+                                                 |
   |               |                                                           |
   |               v                                                           |
   |   6. LIGHTPATH CREATED                                                    |
   |                                                                           |
   +==========================================================================+

**Policies answer ONE question:** Given multiple feasible paths, which one should we use?

The Component Map
-----------------

Here's how all the confusing components relate:

.. code-block:: text

   +==========================================================================+
   |                         FUSION COMPONENT MAP                              |
   +==========================================================================+
   |                                                                           |
   |   DECISION LAYER ("What to do")                                           |
   |   +------------------------------------------------------------------+    |
   |   |  fusion/policies/           <-- YOU ARE HERE                     |    |
   |   |  - Chooses which path to use                                     |    |
   |   |  - Heuristics, ML models, RL models                              |    |
   |   +------------------------------------------------------------------+    |
   |                              |                                            |
   |                              | "Use path 3"                               |
   |                              v                                            |
   |   ORCHESTRATION LAYER ("How to coordinate")                               |
   |   +------------------------------------------------------------------+    |
   |   |  fusion/core/orchestrator.py                                     |    |
   |   |  - Coordinates the request lifecycle                             |    |
   |   |  - Calls pipelines in order                                      |    |
   |   |  - Asks policy for decisions                                     |    |
   |   +------------------------------------------------------------------+    |
   |                              |                                            |
   |                              v                                            |
   |   PIPELINE LAYER ("How to do multi-step operations")                      |
   |   +------------------------------------------------------------------+    |
   |   |  fusion/pipelines/                                               |    |
   |   |  - RoutingPipeline: find paths with protection                   |    |
   |   |  - SlicingPipeline: split large requests                         |    |
   |   |  - ProtectionPipeline: 1+1 backup allocation                     |    |
   |   +------------------------------------------------------------------+    |
   |                              |                                            |
   |                              v                                            |
   |   ALGORITHM LAYER ("How to do single operations")                         |
   |   +------------------------------------------------------------------+    |
   |   |  fusion/modules/routing/    - K-shortest path, congestion-aware  |    |
   |   |  fusion/modules/spectrum/   - First-fit, best-fit assignment     |    |
   |   |  fusion/modules/snr/        - GSNR calculation                   |    |
   |   +------------------------------------------------------------------+    |
   |                              |                                            |
   |                              v                                            |
   |   DATA LAYER ("What we're working with")                                  |
   |   +------------------------------------------------------------------+    |
   |   |  fusion/domain/             - Request, NetworkState, Lightpath   |    |
   |   |  fusion/interfaces/         - Protocols (contracts)              |    |
   |   +------------------------------------------------------------------+    |
   |                                                                           |
   +==========================================================================+

Why Do Policies Exist?
----------------------

**Without policies:** The path selection logic is hardcoded in the simulator.
Want to try a different strategy? Edit the simulator code.

**With policies:** Path selection is pluggable. The simulator asks "which path?"
and the policy answers. Want to try ML? Just swap the policy.

.. code-block:: python

   # Without policies (hardcoded in simulator)
   def serve_request(request, paths):
       for path in paths:
           if path.is_feasible:
               return allocate(path)  # Always picks first feasible

   # With policies (pluggable)
   def serve_request(request, paths, policy):
       action = policy.select_action(request, paths, network_state)
       return allocate(paths[action])  # Policy decides

This separation enables:

1. **Experimentation**: Compare heuristics vs ML vs RL without changing simulator
2. **Research**: Train RL agents, then deploy trained policies
3. **Flexibility**: Different policies for different scenarios

Legacy vs. Orchestrator Architecture
====================================

.. warning::

   FUSION has TWO architectures. Understanding which one you're working with
   is critical to understanding where policies fit.

.. code-block:: text

   +==========================================================================+
   |                    LEGACY vs ORCHESTRATOR                                 |
   +==========================================================================+
   |                                                                           |
   |   LEGACY (SDNController)              ORCHESTRATOR (SDNOrchestrator)      |
   |   ========================            ==============================      |
   |                                                                           |
   |   - Decision logic embedded           - Decision logic in policies        |
   |   - Uses modules directly             - Uses pipelines + adapters         |
   |   - No ControlPolicy protocol         - Uses ControlPolicy protocol       |
   |   - Hardcoded heuristics              - Pluggable policies                |
   |                                                                           |
   |   fusion/core/sdn_controller.py       fusion/core/orchestrator.py         |
   |            |                                      |                       |
   |            v                                      v                       |
   |   fusion/modules/routing/             fusion/pipelines/ (routing)         |
   |   fusion/modules/spectrum/            fusion/pipelines/ (slicing)         |
   |            |                                      |                       |
   |            |                                      v                       |
   |            |                          fusion/policies/  <-- NEW           |
   |            |                                      |                       |
   |            v                                      v                       |
   |   [Hardcoded: first feasible]         [Pluggable: any policy]             |
   |                                                                           |
   +==========================================================================+

**Key point:** Policies are part of the NEW orchestrator architecture. If you're
working with the legacy SDNController, policies are not used.

How Policies Work Internally
============================

The ControlPolicy Protocol
--------------------------

All policies implement this protocol (from ``fusion/interfaces/control_policy.py``):

.. code-block:: python

   class ControlPolicy(Protocol):
       def select_action(
           self,
           request: Request,
           options: list[PathOption],
           network_state: NetworkState,
       ) -> int:
           """Return index of selected path, or -1 if none."""
           ...

       def update(
           self,
           request: Request,
           action: int,
           reward: float,
       ) -> None:
           """Update policy from experience (no-op for heuristics/deployment)."""
           ...

Policy Types
------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Type
     - When to Use
     - Examples
   * - **Heuristic**
     - Baselines, simple deployments, fallbacks
     - FirstFeasible, ShortestFeasible, LeastCongested
   * - **ML**
     - Deploy pre-trained supervised models
     - PyTorch, sklearn, ONNX models
   * - **RL**
     - Deploy pre-trained RL agents
     - SB3 PPO, MaskablePPO, DQN

Decision Flow
-------------

.. code-block:: text

   +===========================================================================+
   |                      POLICY DECISION FLOW                                  |
   +===========================================================================+
   |                                                                            |
   |   INPUT                                                                    |
   |   +------------------------+                                               |
   |   | Request                |  bandwidth=100Gbps, src=A, dst=Z              |
   |   | PathOptions (list)     |  [Path0, Path1, Path2, Path3, Path4]          |
   |   | NetworkState           |  current topology and spectrum state          |
   |   +------------------------+                                               |
   |              |                                                             |
   |              v                                                             |
   |   FEASIBILITY CHECK (already done by pipelines)                            |
   |   +------------------------+                                               |
   |   | PathOption.is_feasible |  [True, False, True, False, True]             |
   |   +------------------------+                                               |
   |              |                                                             |
   |              v                                                             |
   |   POLICY DECISION                                                          |
   |   +------------------------+                                               |
   |   | Heuristic: "Pick shortest feasible" -> Path 2                          |
   |   | ML Model:  "Score each, pick highest" -> Path 4                        |
   |   | RL Model:  "Predict action from obs" -> Path 0                         |
   |   +------------------------+                                               |
   |              |                                                             |
   |              v                                                             |
   |   OUTPUT                                                                   |
   |   +------------------------+                                               |
   |   | action = 2             |  (index of selected path)                     |
   |   +------------------------+                                               |
   |                                                                            |
   +===========================================================================+

Components
==========

heuristic_policy.py
-------------------

:Purpose: Rule-based deterministic policies
:Classes: ``FirstFeasiblePolicy``, ``ShortestFeasiblePolicy``, ``LeastCongestedPolicy``, ``RandomFeasiblePolicy``, ``LoadBalancedPolicy``

.. code-block:: python

   from fusion.policies import ShortestFeasiblePolicy

   policy = ShortestFeasiblePolicy()
   action = policy.select_action(request, options, network_state)
   # Returns index of shortest feasible path

ml_policy.py
------------

:Purpose: Deploy pre-trained ML models (PyTorch, sklearn, ONNX)
:Classes: ``MLControlPolicy``, ``FeatureBuilder``, model wrappers

.. code-block:: python

   from fusion.policies import MLControlPolicy

   policy = MLControlPolicy(
       model_path="model.pt",
       fallback_type="first_feasible",  # Fallback if model fails
   )
   action = policy.select_action(request, options, network_state)

**Key feature:** Automatic fallback to heuristic on errors.

rl_policy.py
------------

:Purpose: Deploy pre-trained Stable-Baselines3 models
:Classes: ``RLPolicy``, ``RLControlPolicy``

.. code-block:: python

   from fusion.policies import RLPolicy

   policy = RLPolicy.from_file(
       model_path="trained_ppo.zip",
       algorithm="MaskablePPO",
   )
   action = policy.select_action(request, options, network_state)

**Key feature:** Supports action masking for feasibility constraints.

policy_factory.py
-----------------

:Purpose: Create policies from configuration
:Classes: ``PolicyFactory``, ``PolicyConfig``

.. code-block:: python

   from fusion.policies import PolicyFactory, PolicyConfig

   # From config object
   config = PolicyConfig(policy_type="heuristic", policy_name="shortest")
   policy = PolicyFactory.create(config)

   # From dictionary (e.g., config file)
   policy = PolicyFactory.from_dict({"policy_type": "rl", "model_path": "model.zip"})

Frequently Asked Questions
==========================

**Q: What's the difference between policies and pipelines?**

- **Policies** = decision-making ("which path?")
- **Pipelines** = execution ("find paths", "assign spectrum", "allocate protection")

**Q: What's the difference between policies and modules/routing?**

- **Policies** = high-level decision ("use path 3")
- **modules/routing** = low-level algorithm ("here are the 5 shortest paths")

**Q: Why do ML/RL policies have update() if they don't learn?**

The ``update()`` method satisfies the ``ControlPolicy`` protocol. For deployment
policies, it's a no-op. For online RL training, use ``UnifiedSimEnv`` with SB3's
``learn()`` method instead.

**Q: When should I use policies vs. just using modules directly?**

- Use **policies** when you want pluggable path selection in the orchestrator
- Use **modules directly** for custom simulations or legacy SDNController

**Q: Can I add a new policy type?**

Yes! Implement the ``ControlPolicy`` protocol:

.. code-block:: python

   from fusion.interfaces.control_policy import ControlPolicy

   class MyCustomPolicy:
       def select_action(self, request, options, network_state) -> int:
           # Your logic here
           return selected_index

       def update(self, request, action, reward) -> None:
           pass  # No-op for deployment

       def get_name(self) -> str:
           return "MyCustomPolicy"

Testing
=======

.. code-block:: bash

   # Run all policy tests
   pytest fusion/policies/tests/ -v

   # Run with coverage
   pytest --cov=fusion.policies fusion/policies/tests/

Related Documentation
=====================

- :ref:`pipelines-module` - Multi-step provisioning operations
- :ref:`modules-directory` - Algorithm implementations
- :ref:`core-module` - SDNOrchestrator that uses policies
- :ref:`interfaces-module` - ControlPolicy protocol definition