Linux 6.15 kernel’s CPU power regression fixed in 6.16 Git—details on Intel’s Sierra Forest impact, Rust-based power management upgrades, and how this affects enterprise server performance.
Critical CPU Power Regression in Linux 6.15 Kernel
A significant CPU power consumption regression was discovered in the stable Linux 6.15 kernel release, impacting systems with specific configurations. The issue, now patched in Linux 6.16 Git, will soon be backported to 6.15 point releases.
This regression caused elevated idle power draw, particularly problematic for:
Data centers prioritizing energy efficiency
Enterprise servers running "nosmt" kernel parameters
Suspend-to-idle (S2idle) workloads
Intel engineer Rafael Wysocki, the power management subsystem maintainer, identified the flaw and reverted the problematic commit.
Technical Breakdown: Root Cause and Fix
What Went Wrong?
The regression stemmed from commit 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()"), which disrupted deep package C-states (e.g., PC10) on systems with:
✔ "nosmt" kernel flag enabled
✔ Early SMT sibling offline before cpuidle initialization
Result:
Siblings stuck in C1 state (via HLT) instead of deeper sleep
40-60% higher idle power consumption (estimated)
Suspend-to-idle efficiency drops
The Solution
Wysocki’s revert restored mwait_play_dead(), allowing proper C-state transitions. The fix prioritizes:
Stability for 6.15.y backports
Power efficiency for Intel Xeon (Sierra Forest) and others
Future-proofing with Rust-based power management code
Broader Implications for Linux Performance
1. Impact on Intel Xeon "Sierra Forest"
The original patch targeted Xeon 6 C-state bugs, highlighting:
Trade-offs between optimization and stability
Hardware-specific kernel tuning challenges
2. Rust in Power Management
The same pull request introduced Rust abstractions for:
CPUFreq (dynamic clock scaling)
OPP (Operating Performance Points)
CLK (clock control)
Cpumasks (CPU core management)
Why it matters? Rust’s memory safety could reduce future regressions.
Key Takeaways for SysAdmins & Developers
✅ Monitor 6.15.y updates for the power fix
✅ Benchmark suspend-to-idle post-patch
✅ Evaluate Rust-based drivers for future deployments
FAQ: Linux 6.15 Power Regression
Q: How severe was the regression?
A: "Nasty" (per Wysocki)—idle power spikes made it critical for data centers.
Q: Will 6.16 Git prevent similar issues?
A: Yes, Rust integration aims to improve long-term reliability.
Q: Which systems were most affected?
A: Servers with "nosmt" flags and Intel Xeon workloads.

Nenhum comentário:
Postar um comentário