HomeChangelog

Changelog

Release history for the frx CLI and analysis pipeline. Newest first.

New7
  • Added kernel duration parsing to kernel_inspector.py with kernel_duration_us plus NCU metric aliases for duration and gpu__time_duration_sum.

  • Added kernel_attribution.py with compute_kernel_attribution(summaries, arch_profile), producing per-kernel roofline region, runtime share, MFU percentage, opportunity label, and opportunity_score.

  • NCU results now include result["kernel_attribution"]["top_opportunities"], sorted by opportunity_score so large high-impact kernels outrank tiny severe kernels.

  • Added Framework Abstraction Tax V1 via compute_framework_abstraction_tax(run_summary, bottlenecks). It estimates GPU-idle fraction not explained by input, copy, or sync stalls, scaled by launch-stream fragmentation.

  • Framework Abstraction Tax is wired through summarize_step_scope for both run and steady_state summaries, exported from __init__.py, and rendered in CLI analysis reports.

  • Added regression protection for framework tax wiring through summarize_step_scope and for clean bad --arch-profile errors in the profile CLI tests.

  • Added 28 kernel attribution tests plus 7 Framework Abstraction Tax tests; status is 715 passed and 3 skipped, excluding 13 pre-existing nvcc-dependent test_real_cuda_kernels failures on this machine.

Improvements4
  • Opportunity scoring is time-weighted when Duration is available: opportunity_score = runtime_share x severity x mfu_gap. has_runtime_share tells consumers whether scores are runtime-weighted or severity-only.

  • Kernel attribution classifies memory-bound kernels with severity 1.0, low-MFU compute kernels with severity 0.85, and well-utilized kernels near zero opportunity.

  • Framework contributors are gated on the final score, not raw idle, so low-score input-bound workloads do not show misleading fragmentation drivers. Inferred opportunities such as graph capture and fusion are marked inferred=True.

  • Documented the conservative V1 scope: Framework Abstraction Tax is infer-only for graph capture and compile/fusion opportunities, with no speedup multiplier and uncalibrated severity bands by design.

Fixes1
  • Clean CLI error handling now catches FileNotFoundError and ValueError in main(), prints frx: <message> to stderr, and exits 1 without a traceback. This covers --arch-profile and existing --config bad-path cases.