.. _ml-module:

=======================
Machine Learning Module
=======================

.. note::

   **Status: Beta** - This module is actively used by the legacy simulation path.
   Orchestrator integration is planned for a future version.

Quick Summary: What This Module Is and Isn't
============================================

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - This Module IS
     - This Module IS NOT
   * - Utilities for supervised/unsupervised learning
     - Reinforcement learning (that's ``fusion/modules/rl/``)
   * - Feature extraction for traditional ML
     - RL feature extractors (that's ``fusion/modules/rl/feat_extrs/``)
   * - Model save/load, evaluation, visualization
     - RL policies or environments
   * - Used by legacy path (SDNController)
     - Currently used by orchestrator (planned for v6.x)

**If you want RL**, go to :ref:`modules-directory` and look at the RL section.

Overview
========

.. admonition:: At a Glance
   :class: tip

   :Purpose: Utilities for supervised/unsupervised ML in network optimization
   :Location: ``fusion/modules/ml/``
   :Status: **Beta** - Actively used by legacy path
   :Used By: Legacy path (SDNController, SimulationEngine)
   :Planned: Orchestrator integration in v6.x

The ML module provides utilities for traditional machine learning approaches to
network optimization - specifically supervised learning (SL) and unsupervised
learning (USL). It does NOT contain the models themselves; you bring your own
models (sklearn, tensorflow, pytorch) and use these utilities around them.

What This Module Provides
-------------------------

1. **Feature Engineering**: Extract features from network state for ML models
2. **Preprocessing**: Data transformation, normalization, balancing
3. **Model I/O**: Save and load trained models
4. **Evaluation**: Metrics calculation and model comparison
5. **Visualization**: Plotting for ML results (confusion matrices, feature importance)

Relationship with MLControlPolicy
==================================

There are currently **two separate** ML-related areas in FUSION:

.. list-table::
   :header-rows: 1
   :widths: 20 30 30 20

   * - Component
     - Location
     - Purpose
     - Status
   * - ML Utilities (this module)
     - ``fusion/modules/ml/``
     - Feature extraction, preprocessing, model I/O, evaluation
     - Used by legacy path
   * - ML Control Policy
     - ``fusion/policies/ml_policy.py``
     - Path selection using pre-trained models
     - Used by orchestrator

Current State (Not Yet Integrated)
----------------------------------

These two components were built independently and **do not share code**:

- ``fusion/modules/ml/`` has ``extract_ml_features()`` that works with legacy dicts (``engine_props``, ``sdn_props``)
- ``fusion/policies/ml_policy.py`` has its own ``FeatureBuilder`` that works with domain objects (``Request``, ``NetworkState``)
- Both have separate model loading implementations

.. code-block:: text

   Current Architecture (Separate):

   +---------------------------+          +---------------------------+
   | fusion/modules/ml/        |          | fusion/policies/          |
   |                           |          | ml_policy.py              |
   +---------------------------+          +---------------------------+
   | extract_ml_features()     |          | FeatureBuilder            |
   | load_model()              |          | load_model()              |
   | save_model()              |          | MLControlPolicy           |
   | evaluate_classifier()     |          |                           |
   +---------------------------+          +---------------------------+
            |                                        |
            v                                        v
   +---------------------------+          +---------------------------+
   | Legacy Path               |          | Orchestrator Path         |
   | (engine_props, sdn_props) |          | (Request, NetworkState)   |
   +---------------------------+          +---------------------------+

Future Integration (Planned for v6.x)
-------------------------------------

The goal is to unify these so that ``MLControlPolicy`` uses this module's utilities:

- ``MLControlPolicy`` would use this module's feature extraction (adapted for domain objects)
- Model I/O would be consolidated into this module
- Evaluation and visualization utilities would be available for policy analysis

.. code-block:: text

   Future Architecture (Integrated):

   +-----------------------------------------------+
   | fusion/modules/ml/                            |
   +-----------------------------------------------+
   | extract_ml_features()  <-- adapted for both   |
   | load_model()           <-- unified            |
   | save_model()                                  |
   | evaluate_classifier()                         |
   +-----------------------------------------------+
            |                     |
            v                     v
   +------------------+   +------------------+
   | Legacy Path      |   | MLControlPolicy  |
   | SDNController    |   | (orchestrator)   |
   +------------------+   +------------------+

This will eliminate duplicate code and provide a single, well-tested ML utility layer for all paths.

Understanding the Learning Landscape in FUSION
==============================================

FUSION has **two separate modules** for learning-based approaches:

.. code-block:: text

   +===========================================================================+
   |                    LEARNING IN FUSION                                     |
   +===========================================================================+
   |                                                                           |
   |   +---------------------------+     +---------------------------+         |
   |   |   fusion/modules/ml/      |     |   fusion/modules/rl/      |         |
   |   |   (THIS MODULE)           |     |                           |         |
   |   +---------------------------+     +---------------------------+         |
   |   |                           |     |                           |         |
   |   | Supervised Learning (SL)  |     | Reinforcement Learning    |         |
   |   | Unsupervised Learning(USL)|     |                           |         |
   |   |                           |     |                           |         |
   |   | - Feature extraction      |     | - Policies (BC, IQL, etc) |         |
   |   | - Preprocessing           |     | - Environments            |         |
   |   | - Model save/load         |     | - Feature extractors      |         |
   |   | - Evaluation metrics      |     | - SB3 integration         |         |
   |   | - Visualization           |     | - Offline RL              |         |
   |   |                           |     |                           |         |
   |   +---------------------------+     +---------------------------+         |
   |              |                                   |                        |
   |              v                                   v                        |
   |   +---------------------------+     +---------------------------+         |
   |   | LEGACY PATH               |     | ORCHESTRATOR PATH         |         |
   |   | use_orchestrator=False    |     | use_orchestrator=True     |         |
   |   |                           |     |                           |         |
   |   | SDNController calls       |     | RLSimulationAdapter       |         |
   |   | get_ml_obs() for features |     | uses RL policies          |         |
   |   | and loads ML models       |     |                           |         |
   |   |                           |     |                           |         |
   |   | STATUS: WORKS             |     | ML: Planned for v6.x      |         |
   |   +---------------------------+     +---------------------------+         |
   |                                                                           |
   +===========================================================================+

Key Differences
---------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Aspect
     - ML Module (this)
     - RL Module
   * - Learning Type
     - Supervised / Unsupervised
     - Reinforcement Learning
   * - Training
     - Offline on collected data
     - Online or offline from behavior
   * - Legacy Integration
     - **Works** (deploy_model=True)
     - N/A
   * - Orchestrator Integration
     - Planned for v6.x
     - **Works**
   * - Feature Extraction
     - ``get_ml_obs()`` / ``extract_ml_features()``
     - ``feat_extrs/`` (GNN, Graphormer, etc.)

Current Integration Status
==========================

Visual: How ML Module Integrates with Legacy Path
-------------------------------------------------

.. code-block:: text

   +===========================================================================+
   |                     LEGACY PATH (WORKS)                                    |
   |                  (use_orchestrator=False)                                  |
   |                  (deploy_model=True)                                       |
   +===========================================================================+
                              |
                              v
   +-----------------------------+
   | SimulationEngine            |
   | (fusion/core/simulation.py) |
   |                             |
   | At startup:                 |
   | load_model() ---------------|---> fusion/modules/ml/model_io.py
   |                             |     Loads trained ML model
   +-------------+---------------+
                 |
                 | passes ml_model to
                 v
   +---------------------------------+
   | SDNController                   |
   | (fusion/core/sdn_controller.py) |
   |                                 |
   | During routing:                 |
   | get_ml_obs() -------------------|---> fusion/modules/ml/feature_engineering.py
   |                                 |     Extracts features from network state
   |                                 |
   | ml_model.predict() -------------|---> Your trained model
   |                                 |     Makes routing decision
   +---------------------------------+

   +===========================================================================+
   |                     ORCHESTRATOR PATH                                      |
   |                  (use_orchestrator=True)                                   |
   +===========================================================================+
                              |
                              v
   +---------------------------+
   | SDNOrchestrator           |
   |                           |
   | Currently: Uses RL module |
   | Planned: ML integration   |
   |          in v6.x          |
   +---------------------------+

What Works and What's Planned
-----------------------------

.. list-table::
   :header-rows: 1
   :widths: 40 30 30

   * - Capability
     - Legacy Path
     - Orchestrator Path
   * - Feature extraction (get_ml_obs)
     - **Works**
     - Planned v6.x
   * - Model loading
     - **Works**
     - Planned v6.x
   * - Preprocessing utilities
     - Works
     - Works (standalone)
   * - Evaluation metrics
     - Works
     - Works (standalone)
   * - Visualization
     - Works
     - Works (standalone)

About the Visualization Code
============================

**Q: Why does this module have its own visualization.py?**

FUSION has a central visualization module at ``fusion/visualization/`` with a plugin
architecture. However, the ML module's visualization functions have not yet been
integrated into it.

.. code-block:: text

   Central Module:
   fusion/visualization/           - Central visualization with plugin architecture

   Module-specific (uses plugin system):
   fusion/modules/routing/visualization/   - Routing plugin
   fusion/modules/spectrum/visualization/  - Spectrum plugin
   fusion/modules/snr/visualization/       - SNR plugin
   fusion/modules/rl/visualization/        - RL plugin

   Not yet integrated:
   fusion/modules/ml/visualization.py      - ML plots (to be integrated)

The ML visualization functions (confusion matrices, feature importance, cluster plots)
are planned to be integrated into the central plugin system in a future version.

Module Components
=================

.. code-block:: text

   fusion/modules/ml/
   |-- __init__.py            # Public API exports
   |-- README.md              # Module documentation
   |-- TODO.md                # Development roadmap
   |-- constants.py           # Shared constants
   |-- feature_engineering.py # Feature extraction (get_ml_obs)
   |-- preprocessing.py       # Data transformation
   |-- model_io.py            # Model save/load
   |-- evaluation.py          # Metrics calculation
   |-- visualization.py       # Plotting utilities
   `-- registry.py            # Model registry (currently empty)

feature_engineering.py
----------------------

:Purpose: Extract features from network state for ML models
:Key Function: ``extract_ml_features()`` (alias: ``get_ml_obs()``)
:Used By: SDNController in legacy path

.. code-block:: python

   from fusion.modules.ml import extract_ml_features

   # Extract features for ML model input
   features_df = extract_ml_features(
       request_dict=request,
       engine_properties=engine_props,
       sdn_properties=sdn_props
   )

   # Use with your trained model
   prediction = model.predict(features_df)

model_io.py
-----------

:Purpose: Save and load trained models
:Key Functions: ``save_model()``, ``load_model()``
:Used By: SimulationEngine in legacy path

.. code-block:: python

   from fusion.modules.ml import save_model, load_model

   # Save trained model
   save_model(sim_dict, model, "random_forest", "1000")

   # Load model (SimulationEngine does this automatically when deploy_model=True)
   model = load_model(engine_properties=engine_props)

preprocessing.py
----------------

:Purpose: Data transformation and preparation
:Key Functions: ``process_training_data()``, ``balance_training_data()``, ``normalize_features()``

.. code-block:: python

   from fusion.modules.ml import (
       process_training_data,
       balance_training_data,
       normalize_features
   )

   # Process raw simulation data
   processed = process_training_data(sim_dict, raw_data, erlang)

   # Balance classes (for imbalanced datasets)
   balanced = balance_training_data(processed)

   # Normalize features
   normalized = normalize_features(balanced)

evaluation.py
-------------

:Purpose: Calculate evaluation metrics
:Key Functions: ``evaluate_classifier()``, ``evaluate_regressor()``, ``cross_validate_model()``

.. code-block:: python

   from fusion.modules.ml import evaluate_classifier, compare_models

   # Evaluate a classifier
   metrics = evaluate_classifier(model, test_features, test_labels)
   # Returns: accuracy, precision, recall, f1, confusion_matrix

   # Compare multiple models
   comparison = compare_models([model1, model2], test_data, labels)

visualization.py
----------------

:Purpose: ML-specific plotting
:Key Functions: ``plot_confusion_matrix()``, ``plot_feature_importance()``, ``plot_2d_clusters()``

.. code-block:: python

   from fusion.modules.ml import plot_confusion_matrix, plot_feature_importance

   # Plot confusion matrix
   plot_confusion_matrix(y_true, y_pred, class_names)

   # Plot feature importance
   plot_feature_importance(model, feature_names)

Related: fusion/core/ml_metrics.py
----------------------------------

There's also ``fusion/core/ml_metrics.py`` which collects training data during
simulation. This is related but separate:

- ``MLMetricsCollector`` collects data during simulation runs
- The collected data can then be used to train models
- Those trained models are loaded/used via this ML module

Development Guide
=================

Using the ML Module (Legacy Path)
---------------------------------

1. **Collect training data** during simulations using ``MLMetricsCollector``
2. **Process the data** using ``preprocessing.py`` utilities
3. **Train your model** (sklearn, tensorflow, etc. - not provided)
4. **Save the model** using ``save_model()``
5. **Run simulations** with ``deploy_model=True`` - model is loaded automatically

Testing
=======

.. code-block:: bash

   # Run ML module tests
   pytest fusion/modules/tests/ml/ -v

   # Run with coverage
   pytest fusion/modules/tests/ml/ -v --cov=fusion.modules.ml

Related Documentation
=====================

- :ref:`modules-directory` - Overview of all modules including RL
- :ref:`core-module` - SimulationEngine and SDNController integration
- ``fusion/modules/rl/`` - RL module (used by orchestrator)
- ``fusion/core/ml_metrics.py`` - Training data collection