Reporting Module

Overview

At a Glance

Purpose:

Format, aggregate, and export simulation results for human consumption and ML training

Location:

fusion/reporting/

Key Files:

simulation_reporter.py, aggregation.py, csv_export.py, statistics.py, dataset_logger.py

Depends On:

fusion.utils.logging_config, numpy

Used By:

fusion.core.simulation, offline RL training pipelines

The reporting module is the presentation layer for FUSION simulation results. It handles how results are displayed to users, exported for analysis, and logged for ML training. This module separates presentation concerns from data collection (fusion.core.metrics) and network analysis (fusion.analysis).

When you work here:

  • Adding new console output formats for simulation progress

  • Creating new export formats (e.g., Parquet, HDF5)

  • Adding new statistical aggregation methods for multi-seed experiments

  • Extending offline RL dataset logging with new fields

  • Adding grooming-specific reporting features

Module Differentiation

Understanding how reporting differs from related modules is crucial:

Module Responsibilities

Module

Primary Purpose

What It Does

When To Use

fusion.io

Data Input/Output

Loads network topologies, generates physical layer parameters, exports raw simulation data

Loading networks, saving raw JSON/CSV data

fusion.analysis

Network Analysis

Analyzes network state (utilization, congestion, bottlenecks) during/after simulation

Computing network metrics, identifying problems

fusion.reporting

Presentation Layer

Formats results for humans, aggregates multi-seed statistics, logs ML training data

Console output, statistical summaries, RL datasets

fusion.core.metrics

Statistics Collection

Collects raw statistics during simulation (blocking, hops, SNR, etc.)

Tracking every metric during simulation

Data Flow Between Modules:

Simulation Engine
      |
      v
+---------------------+       +--------------------+       +--------------------+
| fusion.core.metrics | ----> | fusion.analysis    | ----> | fusion.reporting   |
| (collect raw stats) |       | (analyze network)  |       | (format & export)  |
+---------------------+       +--------------------+       +--------------------+
      |                                                           |
      v                                                           v
+---------------------+                                  +--------------------+
| fusion.io.exporter  |                                  | Console Output     |
| (raw data files)    |                                  | CSV Summaries      |
+---------------------+                                  | RL Datasets        |
                                                         +--------------------+

Key Distinction from io module:

  • fusion.io: Low-level file operations (read/write JSON, CSV) with no semantic understanding

  • fusion.reporting: High-level presentation with semantic understanding (aggregation, formatting, confidence intervals)

Key Distinction from analysis module:

  • fusion.analysis: Computes derived metrics FROM network state (utilization, congestion)

  • fusion.reporting: Presents statistics TO users (formatting, export, display)

statistics.py vs metrics.py

This is a common source of confusion. Here is how they differ:

Statistics vs Metrics

Aspect

fusion.core.metrics.py (SimStats)

fusion.reporting.statistics.py (GroomingStatistics)

Purpose

Core statistics collection engine

Grooming-specific statistics only

Scope

ALL simulation metrics (blocking, hops, SNR, transponders, resource usage)

ONLY traffic grooming metrics (grooming rates, lightpath utilization, bandwidth savings)

When Used

Every simulation, always enabled

Only when grooming is enabled (is_grooming_enabled=True)

Size

2,000+ lines (comprehensive)

275 lines (focused)

Key Class

SimStats

GroomingStatistics, SimulationStatistics

Location

fusion/core/metrics.py

fusion/reporting/statistics.py

Why the separation?

  1. SimStats is a large, monolithic class that handles all core simulation statistics

  2. GroomingStatistics is a specialized class for traffic grooming experiments

  3. Keeping grooming stats in reporting maintains separation of concerns

  4. Future plan: Split SimStats into focused modules (see Metrics Guide)

Usage Example:

# Core metrics - always used
from fusion.core.metrics import SimStats

stats = SimStats(engine_props, sim_info)
stats.iter_update(request_data, sdn_data, network_spectrum)

# Grooming statistics - only when grooming enabled
from fusion.reporting import GroomingStatistics

if engine_props.get("is_grooming_enabled"):
    grooming_stats = GroomingStatistics()
    grooming_stats.update_grooming_outcome(was_groomed, partial, bandwidth, new_lps)

Key Concepts

Multi-Seed Aggregation

Running the same experiment with different random seeds and combining results with statistical analysis (mean, standard deviation, 95% confidence intervals).

Confidence Interval (CI95)

The range within which the true value lies with 95% probability. Calculated as CI95 = 1.96 * std / sqrt(n) where n is the sample count.

Comparison Table

Side-by-side comparison of baseline vs RL policy results, showing improvement percentages with statistical significance.

Offline RL Dataset

JSONL files containing (state, action, reward, next_state, action_mask) tuples for training offline RL algorithms (BC, IQL, CQL).

Grooming Statistics

Metrics specific to traffic grooming: grooming rate, lightpath utilization, bandwidth savings, transponder blocking.

Architecture

Module Structure

fusion/reporting/
|-- __init__.py              # Public API exports
|-- simulation_reporter.py   # Console output formatting
|-- aggregation.py           # Multi-seed statistical aggregation
|-- csv_export.py            # CSV export utilities
|-- statistics.py            # Grooming-specific statistics
|-- dataset_logger.py        # Offline RL dataset logging (JSONL)
|-- README.md                # Module documentation
`-- tests/                   # Unit tests
    |-- test_simulation_reporter.py
    |-- test_aggregation.py
    |-- test_csv_export.py
    |-- test_statistics.py
    `-- test_dataset_logger.py

Data Flow

Console Reporting Flow:

SimStats (core) --> SimulationReporter --> Console/Log
                    |
                    +-> report_iteration_stats()
                    +-> report_simulation_complete()
                    +-> report_blocking_statistics()

Multi-Seed Aggregation Flow:

Seed 1 Results --+
Seed 2 Results --+--> aggregate_seed_results() --> Mean, Std, CI95
Seed 3 Results --+                                      |
                                                        v
                                           export_aggregated_results() --> CSV

Offline RL Dataset Flow:

Simulation --> DatasetLogger.log_transition() --> JSONL File
                                                       |
                                                       v
                                                 load_dataset() --> RL Training

Components

simulation_reporter.py

Purpose:

Format and display simulation progress and results

Key Classes:

SimulationReporter

Handles all console output and logging for simulation progress. Integrates with the Python logging system for proper message delivery.

from fusion.reporting import SimulationReporter

reporter = SimulationReporter(verbose=True)

# Report iteration progress
reporter.report_iteration_stats(
    iteration=5,
    max_iterations=100,
    erlang=50.0,
    blocking_list=[0.01, 0.02, 0.015],
    print_flag=True
)

# Report final results
reporter.report_simulation_complete(
    erlang=50.0,
    iterations_completed=100,
    confidence_interval=95.0
)

Key Methods:

  • report_iteration_stats() - Progress for each iteration

  • report_simulation_start() - Log startup information

  • report_simulation_complete() - Final results with CI

  • report_blocking_statistics() - Detailed blocking breakdown

  • create_summary_report() - Generate formatted summary string

aggregation.py

Purpose:

Aggregate results across multiple random seeds with statistical analysis

Key Functions:

aggregate_seed_results(), create_comparison_table(), format_comparison_for_display()

Computes mean, standard deviation, and 95% confidence intervals across multiple seed runs. Essential for statistically valid comparisons.

from fusion.reporting import (
    aggregate_seed_results,
    create_comparison_table,
    format_comparison_for_display
)

# Results from multiple seeds
results = [
    {"bp_overall": 0.10, "hops_mean": 3.2, "seed": 42},
    {"bp_overall": 0.11, "hops_mean": 3.1, "seed": 43},
    {"bp_overall": 0.09, "hops_mean": 3.3, "seed": 44},
]

# Aggregate with CI95
aggregated = aggregate_seed_results(results, metric_keys=["bp_overall", "hops_mean"])
# Returns: {
#     "bp_overall": {"mean": 0.10, "std": 0.01, "ci95_lower": 0.089, "ci95_upper": 0.111, "n": 3},
#     "hops_mean": {"mean": 3.2, "std": 0.1, ...}
# }

# Compare baseline vs RL
comparison = create_comparison_table(baseline_results, rl_results, metrics=["bp_overall"])
print(format_comparison_for_display(comparison))
# Output:
# Metric         | Baseline           | RL                 | Improvement
# -------------------------------------------------------------------------------
# bp_overall     | 0.1050 +/- 0.0071  | 0.0850 +/- 0.0071  | +19.05%

csv_export.py

Purpose:

Export simulation results to CSV format for analysis tools

Key Functions:

export_results_to_csv(), export_aggregated_results(), export_comparison_table(), append_result_to_csv()

Provides CSV export utilities with smart column ordering and support for incremental logging during long experiments.

from fusion.reporting import (
    export_results_to_csv,
    export_aggregated_results,
    append_result_to_csv
)

# Export multiple results
export_results_to_csv(results, "output/all_results.csv")

# Export aggregated statistics
export_aggregated_results(
    aggregated,
    "output/summary.csv",
    metadata={"topology": "NSFNet", "policy": "baseline"}
)

# Append single result (for incremental logging)
append_result_to_csv(result, "output/running_results.csv")

statistics.py

Purpose:

Grooming-specific statistics collection and reporting

Key Classes:

GroomingStatistics, SimulationStatistics

Key Functions:

generate_grooming_report(), export_grooming_stats_csv()

Tracks metrics specific to traffic grooming experiments.

from fusion.reporting import (
    GroomingStatistics,
    generate_grooming_report,
    export_grooming_stats_csv
)

# Create grooming statistics tracker
grooming_stats = GroomingStatistics()

# Update on each request
grooming_stats.update_grooming_outcome(
    was_groomed=True,
    was_partially_groomed=False,
    bandwidth=100.0,
    new_lightpaths=0
)

# Update on lightpath release
grooming_stats.update_lightpath_release(
    _lightpath_id=1,
    utilization=0.75,
    _lifetime=120.0
)

# Generate report
report = generate_grooming_report(grooming_stats)
print(report)

# Export to CSV
export_grooming_stats_csv(grooming_stats, "output/grooming_stats.csv")

Metrics Tracked:

  • Grooming outcomes (fully groomed, partially groomed, not groomed)

  • Lightpath lifecycle (created, released, active, utilization)

  • Bandwidth efficiency (groomed vs new lightpath bandwidth)

  • Transponder usage (blocking counts, per-node usage)

dataset_logger.py

Purpose:

Log simulation transitions for offline RL training

Key Classes:

DatasetLogger

Key Functions:

load_dataset(), filter_by_window()

Logs (state, action, reward, next_state, action_mask) tuples in JSONL format for training offline RL algorithms (Behavior Cloning, IQL, CQL).

from fusion.reporting import DatasetLogger

# Use as context manager
with DatasetLogger("datasets/training.jsonl", engine_props) as logger:
    for request in requests:
        # ... process request ...
        logger.log_transition(
            state=state_dict,
            action=action_idx,
            reward=reward,
            next_state=next_state_dict,
            action_mask=mask_list,
            meta={"request_id": request.id}
        )

# Load for training
from fusion.reporting.dataset_logger import load_dataset

for transition in load_dataset("datasets/training.jsonl"):
    state = transition["state"]
    action = transition["action"]
    # ... use for training ...

JSONL Format:

{"t": 456, "seed": 42, "state": {"src": 0, "dst": 13}, "action": 0, "reward": 1.0, "next_state": null, "action_mask": [true, false], "meta": {"request_id": 123}}

Dependencies

This Module Depends On

  • fusion.utils.logging_config - Consistent logging across the module

  • External: numpy - Statistical calculations (mean, std, CI95)

  • External: json, csv, pathlib - File operations

  • External: statistics - Python standard library statistics

Modules That Depend On This

  • fusion.core.simulation - Uses SimulationReporter for console output

  • fusion.modules.rl - Uses DatasetLogger for offline RL data collection

  • Analysis scripts - Use aggregation and export functions

Development Guide

Getting Started

  1. Read the Key Concepts section above

  2. Understand the Module Differentiation to know where this module fits

  3. Examine simulation_reporter.py for console output patterns

  4. Run tests to see example inputs and expected outputs

Common Tasks

Adding a new export format

  1. Create a new export function in csv_export.py (or create a new file for complex formats)

  2. Follow the existing pattern: accept results dict, output path

  3. Create parent directories with Path.mkdir(parents=True, exist_ok=True)

  4. Add tests in tests/test_csv_export.py

Adding new console output

  1. Add a new method to SimulationReporter in simulation_reporter.py

  2. Use the logger (self.logger) for output, not print statements

  3. Follow existing formatting patterns for consistency

  4. Add tests in tests/test_simulation_reporter.py

Adding new grooming metrics

  1. Add new fields to GroomingStatistics.__init__() in statistics.py

  2. Add update logic in the appropriate method (update_grooming_outcome or new method)

  3. Include in to_dict() for serialization

  4. Update generate_grooming_report() to display new metrics

  5. Add tests in tests/test_statistics.py

Extending offline RL dataset

  1. Add new fields to the transition dict in DatasetLogger.log_transition()

  2. Update load_dataset() if special handling needed

  3. Document new fields in the JSONL format section

  4. Add tests in tests/test_dataset_logger.py

Code Patterns

Statistical Aggregation Pattern

def aggregate_something(results: list[dict]) -> dict:
    """Aggregate results with statistical analysis."""
    import numpy as np

    values = [r["metric"] for r in results]
    n = len(values)

    return {
        "mean": np.mean(values),
        "std": np.std(values, ddof=1),  # Sample std
        "ci95_lower": np.mean(values) - 1.96 * np.std(values, ddof=1) / np.sqrt(n),
        "ci95_upper": np.mean(values) + 1.96 * np.std(values, ddof=1) / np.sqrt(n),
        "n": n
    }

JSONL Logging Pattern

import json

def log_entry(entry: dict, filepath: str) -> None:
    """Append entry to JSONL file."""
    with open(filepath, "a", encoding="utf-8") as f:
        f.write(json.dumps(entry) + "\n")
        f.flush()  # Ensure written even on crash

Configuration

Enable offline RL dataset logging in configuration:

[dataset_logging]
log_offline_dataset = true
dataset_output_path = datasets/offline_data.jsonl

Testing

Test Location:

fusion/reporting/tests/

Run Tests:

pytest fusion/reporting/tests/ -v

Coverage Target:

80%+

Test files:

  • test_simulation_reporter.py - Console output formatting

  • test_aggregation.py - Statistical aggregation and comparison tables

  • test_csv_export.py - CSV export functionality

  • test_statistics.py - Grooming statistics tracking

  • test_dataset_logger.py - JSONL logging and loading

Running tests:

# Run all reporting tests
pytest fusion/reporting/tests/ -v

# Run with coverage
pytest --cov=fusion.reporting fusion/reporting/tests/

# Run specific test file
pytest fusion/reporting/tests/test_aggregation.py -v