Added kernel duration parsing to kernel_inspector.py with kernel_duration_us plus NCU metric aliases for duration and gpu__time_duration_sum.
Added kernel_attribution.py with compute_kernel_attribution(summaries, arch_profile), producing per-kernel roofline region, runtime share, MFU percentage, opportunity label, and opportunity_score.
NCU results now include result["kernel_attribution"]["top_opportunities"], sorted by opportunity_score so large high-impact kernels outrank tiny severe kernels.
Added Framework Abstraction Tax V1 via compute_framework_abstraction_tax(run_summary, bottlenecks). It estimates GPU-idle fraction not explained by input, copy, or sync stalls, scaled by launch-stream fragmentation.
Framework Abstraction Tax is wired through summarize_step_scope for both run and steady_state summaries, exported from __init__.py, and rendered in CLI analysis reports.
Added regression protection for framework tax wiring through summarize_step_scope and for clean bad --arch-profile errors in the profile CLI tests.
Added 28 kernel attribution tests plus 7 Framework Abstraction Tax tests; status is 715 passed and 3 skipped, excluding 13 pre-existing nvcc-dependent test_real_cuda_kernels failures on this machine.
Opportunity scoring is time-weighted when Duration is available: opportunity_score = runtime_share x severity x mfu_gap. has_runtime_share tells consumers whether scores are runtime-weighted or severity-only.
Kernel attribution classifies memory-bound kernels with severity 1.0, low-MFU compute kernels with severity 0.85, and well-utilized kernels near zero opportunity.
Framework contributors are gated on the final score, not raw idle, so low-score input-bound workloads do not show misleading fragmentation drivers. Inferred opportunities such as graph capture and fusion are marked inferred=True.
Documented the conservative V1 scope: Framework Abstraction Tax is infer-only for graph capture and compile/fusion opportunities, with no speedup multiplier and uncalibrated severity bands by design.
Clean CLI error handling now catches FileNotFoundError and ValueError in main(), prints frx: <message> to stderr, and exits 1 without a traceback. This covers --arch-profile and existing --config bad-path cases.