RL Adapter Package

At a Glance

Purpose:: Bridge between RL environments and the v6.0 orchestrator stack
Location:: fusion/modules/rl/adapter/
Key Files:: rl_adapter.py, path_option.py
Prerequisites:: Basic understanding of RL concepts (actions, observations, rewards)

This guide teaches you how the adapter works and how to extend it. If you’re new to FUSION’s RL module, start here before diving into environments or algorithms.

What Problem Does This Solve?

When an RL agent needs to make a routing decision, it needs to:

See what paths are available (observation)
Know which paths are actually usable (action masking)
Act by selecting a path (action)
Learn from the outcome (reward)

The adapter handles all of this by sitting between your RL environment and the simulation’s orchestrator. It translates between “RL language” (observations, actions, rewards) and “simulation language” (requests, paths, allocations).

Without adapter:                    With adapter:
+--------------+                    +--------------+
| RL Agent     |                    | RL Agent     |
+--------------+                    +--------------+
      |                                   |
      | (duplicated logic,                | (clean interface)
      |  inconsistent state)              v
      v                             +--------------+
+--------------+                    | Adapter      |  <-- You are here
| Simulation   |                    +--------------+
+--------------+                          |
                                          v
                                    +--------------+
                                    | Orchestrator |
                                    +--------------+

Quick Start Tutorial

This tutorial walks through the adapter’s core workflow. By the end, you’ll understand how a single RL step flows through the system.

Step 1: Create the Adapter

The adapter wraps an orchestrator. They share the same pipeline instances:

from fusion.modules.rl.adapter import RLSimulationAdapter, RLConfig

# Assume you have an orchestrator from your environment setup
# orchestrator = SDNOrchestrator(config, pipelines)

# Create adapter with custom configuration
rl_config = RLConfig(
    k_paths=3,              # Consider top 3 shortest paths
    rl_success_reward=1.0,  # Reward when allocation succeeds
    rl_block_penalty=-1.0,  # Penalty when request is blocked
)
adapter = RLSimulationAdapter(orchestrator, rl_config)

# Critical invariant: same pipeline objects, not copies
assert adapter.routing is orchestrator.routing  # Identity check

Step 2: Get Path Options (Observation Data)

When a request arrives, ask the adapter what paths are available:

# request: the current network request (source, dest, bandwidth, etc.)
# network_state: current state of the network (spectrum usage, etc.)

options = adapter.get_path_options(request, network_state)

# options is a list of PathOption objects
for opt in options:
    print(f"Path {opt.path_index}: {opt.path}")
    print(f"  Length: {opt.weight_km} km, Hops: {opt.num_hops}")
    print(f"  Feasible: {opt.is_feasible}, Slots needed: {opt.slots_needed}")

What happens inside:

Adapter calls routing.find_routes() to get candidate paths
For each path, calls spectrum.find_spectrum() to check if slots are free
Packages everything into immutable PathOption objects

Key point: This is READ-ONLY. No spectrum is allocated yet.

Step 3: Build the Action Mask

RL agents shouldn’t select infeasible paths. The action mask tells them which actions are valid:

from fusion.modules.rl.adapter import compute_action_mask

# Method 1: Use the adapter's convenience method
mask = adapter.get_action_mask(options)

# Method 2: Use the standalone function
mask = compute_action_mask(options, k_paths=3)

# mask is a numpy boolean array: [True, False, True]
# meaning: path 0 is feasible, path 1 is blocked, path 2 is feasible

# Your RL agent uses this to avoid invalid actions
# MaskablePPO example:
# action, _ = model.predict(observation, action_masks=mask)

Step 4: Apply the Selected Action

Once your agent selects a path index, apply it through the adapter:

action = 0  # Agent selected path 0

result = adapter.apply_action(
    action=action,
    request=request,
    network_state=network_state,
    options=options,
)

# result is an AllocationResult
if result.success:
    print("Request allocated successfully!")
else:
    print(f"Request blocked: {result.block_reason}")

What happens inside:

Adapter finds the PathOption matching the action index
Calls orchestrator.handle_arrival() with the path as forced_path
Orchestrator runs full allocation logic (SNR checks, grooming, slicing)
Returns the result

Step 5: Compute the Reward

Convert the allocation result to an RL reward signal:

reward = adapter.compute_reward(result, request)
# reward = 1.0 if success, -1.0 if blocked (based on RLConfig)

That’s it! The complete RL step is:

# Inside your environment's step() method:
options = adapter.get_path_options(request, network_state)
mask = adapter.get_action_mask(options)
# ... agent selects action using mask ...
result = adapter.apply_action(action, request, network_state, options)
reward = adapter.compute_reward(result, request)

Understanding PathOption

PathOption is the core data structure. It’s an immutable snapshot of everything known about a candidate path.

Anatomy of a PathOption

from fusion.modules.rl.adapter import PathOption

# PathOption is a frozen dataclass - you can't modify it after creation
opt = PathOption(
    # Required fields
    path_index=0,                    # Which path (0 to k-1)
    path=("A", "B", "C"),            # Node sequence (tuple, not list!)
    weight_km=150.5,                 # Physical length
    num_hops=2,                      # Number of links
    modulation="QPSK",               # Selected modulation format
    slots_needed=4,                  # Contiguous slots required
    is_feasible=True,                # Can this path be allocated?
    congestion=0.3,                  # [0,1] how congested (0=empty)
    available_slots=0.7,             # [0,1] availability ratio

    # Optional fields (default to None or sensible values)
    spectrum_start=100,              # First slot if feasible
    spectrum_end=104,                # Last slot if feasible
    core_index=0,                    # For multi-core fibers
    band="C",                        # Spectrum band
)

# Immutable - this raises FrozenInstanceError:
# opt.is_feasible = False  # Error!

# Hashable - can use as dict key:
cache = {opt: "some_value"}

Why Immutable?

Thread safety: Can pass between threads without locks
Cache keys: Can use PathOptions as dictionary keys
No surprises: Once created, it never changes
RL pipelines: Many RL frameworks expect immutable observations

Creating PathOptions (Factory Methods)

You rarely create PathOptions directly. Use factory methods instead:

# From pipeline results (used internally by adapter)
opt = PathOption.from_pipeline_results(
    path_index=0,
    route_result=route_result,    # From routing pipeline
    spectrum_result=spectrum_result,  # From spectrum pipeline
    congestion=0.3,
    available_slots=0.7,
)

# For simple unprotected routes
opt = PathOption.from_unprotected_route(
    path_index=0,
    path=["A", "B", "C"],  # List OK here, converted to tuple
    weight_km=150.0,
    is_feasible=True,
    modulation="QPSK",
    slots_needed=4,
    congestion=0.3,
)

# For 1+1 protected routes (primary + backup)
opt = PathOption.from_protected_route(
    path_index=0,
    primary_path=["A", "B", "C"],
    backup_path=["A", "D", "E", "C"],
    primary_weight=100.0,
    backup_weight=150.0,
    primary_feasible=True,
    backup_feasible=True,
    primary_modulation="QPSK",
    backup_modulation="8-QAM",
    slots_needed=4,
    congestion=0.3,
)

# Protected paths have extra properties:
print(opt.is_protected)        # True
print(opt.both_paths_feasible) # True only if BOTH paths have spectrum
print(opt.total_weight_km)     # 250.0 (primary + backup)
print(opt.backup_hop_count)    # 3

Extending the Adapter

This section shows how to add new functionality step by step.

Tutorial: Adding a New PathOption Field

Let’s add a latency_ms field to PathOption.

Step 1: Add the field to the dataclass

Edit path_option.py:

@dataclass(frozen=True)
class PathOption:
    # ... existing fields ...

    # Add new optional field with default
    latency_ms: float | None = None

Step 2: Add validation if needed

In __post_init__:

def __post_init__(self) -> None:
    # ... existing validation ...

    if self.latency_ms is not None and self.latency_ms < 0:
        raise ValueError("latency_ms must be non-negative")

Step 3: Update factory methods

In from_pipeline_results:

@classmethod
def from_pipeline_results(cls, ..., latency_ms: float | None = None) -> PathOption:
    # ... existing code ...
    return cls(
        # ... existing fields ...
        latency_ms=latency_ms,
    )

Step 4: Populate it in the adapter

In rl_adapter.py, in get_path_options():

# After getting route_result
latency = self._compute_latency(path, network_state)

option = PathOption(
    # ... existing fields ...
    latency_ms=latency,
)

Step 5: Add tests

In test_path_option.py:

def test_latency_field_stored_correctly(self) -> None:
    opt = create_valid_path_option(latency_ms=5.2)
    assert opt.latency_ms == 5.2

def test_negative_latency_raises(self) -> None:
    with pytest.raises(ValueError, match="latency_ms must be non-negative"):
        create_valid_path_option(latency_ms=-1.0)

Tutorial: Adding a New Observation Feature

Let’s add network-wide utilization to the observation.

Step 1: Add to RLConfig if configurable

In rl_adapter.py:

@dataclass
class RLConfig:
    # ... existing fields ...
    include_network_util: bool = True  # New config option

Step 2: Implement the computation

Add a private method:

def _compute_network_utilization(
    self,
    network_state: NetworkState,
) -> float:
    """Compute average utilization across all links."""
    if not hasattr(network_state, "get_all_link_utilizations"):
        return 0.0

    utils = network_state.get_all_link_utilizations()
    if not utils:
        return 0.0
    return sum(utils) / len(utils)

Step 3: Add to build_observation()

def build_observation(self, request, options, network_state) -> dict[str, Any]:
    obs = {
        # ... existing observation fields ...
    }

    if self._config.include_network_util:
        obs["network_utilization"] = np.array(
            [self._compute_network_utilization(network_state)],
            dtype=np.float32,
        )

    return obs

Step 4: Update observation space in UnifiedSimEnv

The environment needs to know about the new observation dimension.

Step 5: Add tests

def test_observation_includes_network_util_when_enabled(self) -> None:
    config = RLConfig(include_network_util=True)
    adapter = RLSimulationAdapter(mock_orchestrator, config)

    obs = adapter.build_observation(request, options, network_state)

    assert "network_utilization" in obs
    assert obs["network_utilization"].shape == (1,)

Tutorial: Adding a New Reward Component

Let’s add a fragmentation penalty to the reward.

Step 1: Add config options

@dataclass
class RLConfig:
    # ... existing fields ...
    rl_fragmentation_penalty: float = -0.1
    use_fragmentation_penalty: bool = False

Step 2: Modify compute_reward()

def compute_reward(
    self,
    result: AllocationResult,
    request: Request | None = None,
    network_state: NetworkState | None = None,  # Add parameter
) -> float:
    if not result.success:
        return self._config.rl_block_penalty

    reward = self._config.rl_success_reward

    # New: fragmentation penalty
    if self._config.use_fragmentation_penalty and network_state is not None:
        frag = self._compute_fragmentation_increase(result, network_state)
        reward += frag * self._config.rl_fragmentation_penalty

    return reward

Step 3: Add helper method

def _compute_fragmentation_increase(
    self,
    result: AllocationResult,
    network_state: NetworkState,
) -> float:
    """Compute how much fragmentation increased due to this allocation."""
    # Implementation depends on your fragmentation metric
    # Return value in [0, 1] where 1 = maximum fragmentation increase
    pass

Configuration Reference

RLConfig Fields

Field	Default	Description
`k_paths`	`3`	Number of candidate paths to consider
`rl_success_reward`	`1.0`	Reward when allocation succeeds
`rl_block_penalty`	`-1.0`	Penalty when request is blocked
`rl_grooming_bonus`	`0.1`	Bonus when request is groomed onto existing lightpath
`rl_slicing_penalty`	`-0.05`	Penalty when request requires spectrum slicing
`max_holding_time`	`100.0`	Maximum holding time for normalization
`num_nodes`	`14`	Number of nodes in network (for observation space)
`total_slots`	`320`	Total spectrum slots per link
`use_gnn_obs`	`False`	Include GNN features in observation
`obs_space`	`"obs_8"`	Observation space configuration (obs_1 through obs_8)

Testing Your Changes

The adapter has comprehensive tests. Always run them after changes.

Running Tests

# Run all adapter tests
pytest fusion/modules/rl/adapter/tests/ -v

# Run specific test file
pytest fusion/modules/rl/adapter/tests/test_path_option.py -v

# Run with coverage
pytest fusion/modules/rl/adapter/tests/ -v --cov=fusion.modules.rl.adapter

# Run a specific test
pytest fusion/modules/rl/adapter/tests/test_path_option.py::TestPathOptionCreation -v

Test Structure

Tests follow the AAA pattern (Arrange, Act, Assert):

def test_path_option_validates_congestion_range(self) -> None:
    """Congestion outside [0,1] should raise ValueError."""
    # Arrange - set up test data
    kwargs = {"congestion": 1.5}  # Invalid: > 1.0

    # Act & Assert - verify behavior
    with pytest.raises(ValueError, match="congestion must be in"):
        create_valid_path_option(**kwargs)

Test naming convention: test_<what>_<when>_<expected>

Writing Good Tests

Do test:

Happy path (normal operation)
Edge cases (empty lists, zero values, boundaries)
Error conditions (invalid inputs)
Invariants (immutability, pipeline identity)

Don’t test:

Implementation details that might change
External libraries (they have their own tests)

Example test for a new field:

class TestLatencyField:
    """Tests for the new latency_ms field."""

    def test_latency_stored_correctly(self) -> None:
        opt = create_valid_path_option(latency_ms=5.2)
        assert opt.latency_ms == 5.2

    def test_latency_defaults_to_none(self) -> None:
        opt = create_valid_path_option()
        assert opt.latency_ms is None

    def test_negative_latency_raises_value_error(self) -> None:
        with pytest.raises(ValueError):
            create_valid_path_option(latency_ms=-1.0)

    def test_zero_latency_is_valid(self) -> None:
        opt = create_valid_path_option(latency_ms=0.0)
        assert opt.latency_ms == 0.0

Troubleshooting

Common Issues

“orchestrator cannot be None”

# Wrong:
adapter = RLSimulationAdapter(None)

# Right:
adapter = RLSimulationAdapter(orchestrator)

The adapter requires a valid orchestrator instance.

Action index out of range

# If your agent returns action=5 but k_paths=3
result = adapter.apply_action(5, request, state, options)
# result.success will be False, result.block_reason = NO_PATH

The adapter handles this gracefully but your agent should use action masking.

PathOption is immutable

opt = create_valid_path_option()
opt.is_feasible = False  # FrozenInstanceError!

# Instead, create a new PathOption with different values
# (though you rarely need to do this manually)

Pipeline identity broken

# This should ALWAYS be true:
assert adapter.routing is orchestrator.routing

# If it's not, something is wrong with initialization

File Reference

fusion/modules/rl/adapter/
|-- __init__.py          # Public exports (what you import)
|-- path_option.py       # PathOption, compute_action_mask
|-- rl_adapter.py        # RLSimulationAdapter, RLConfig, DisasterState
`-- tests/
    |-- __init__.py
    |-- test_path_option.py   # PathOption tests
    `-- test_rl_adapter.py    # Adapter tests

What to import:

# Most common imports
from fusion.modules.rl.adapter import (
    RLSimulationAdapter,
    RLConfig,
    PathOption,
    compute_action_mask,
)

# For survivability scenarios
from fusion.modules.rl.adapter import (
    DisasterState,
    create_disaster_state_from_engine,
)

# For offline RL (BC, IQL)
from fusion.modules.rl.adapter import OfflinePolicyAdapter