Critical SUSE Linux cluster-glue update fixes EC2 STONITH reliability (bsc#1247543). Patch instructions for SLE HA, HPC, SAP & openSUSE Leap 15.5/15.6. Boost AWS high-availability cluster stability. Install now for vital cloud infrastructure security.
Why This SUSE cluster-glue Patch is Essential for Cloud Resilience
System administrators managing mission-critical SUSE Linux Enterprise High Availability (HA), High-Performance Computing (HPC), SAP environments, or openSUSE Leap deployments face constant pressure to ensure cluster stability.
The newly released update (SUSE-RU-2025:02774-1, rated Important) directly addresses a critical flaw in the AWS EC2 STONITH (Shoot-The-Other-Node-In-The-Head) fencing agent within the cluster-glue package.
This vulnerability (bsc#1247543) could compromise failover reliability in cloud environments, potentially leading to service outages or data corruption during node failures. Could your AWS-based HA cluster withstand a sudden instance failure without this fix?
Detailed Technical Resolution: EC2 STONITH Reliability Enhanced
This patch delivers a vital improvement for clusters relying on AWS infrastructure:
Robust Instance ID Retrieval: The
stonith/external/ec2fencing agent now correctly sources the EC2 instance ID from the local instance metadata file (/run/cloud-init/instance-data.json), ensuring accurate node identification – the cornerstone of reliable fencing.Intelligent Retry Mechanism: A new
ec2_retryfunction significantly improves fault tolerance. It systematically queries both the AWS Instance Metadata Service (IMDSv1/v2) and the AWS Command Line Interface (AWSCLI) if initial attempts fail. This layered approach mitigates transient network glitches or IMDS unavailability, drastically increasing the probability of successful fencing actions during critical events.
Why does this matter? Unreliable fencing can result in "split-brain" scenarios where multiple nodes believe they own cluster resources, leading to catastrophic data corruption. This patch directly strengthens the safety net for your cloud-based SUSE clusters.
Affected Products & Systems: Is Your Environment Vulnerable?
This update is mandatory for the following SUSE distributions and extensions:
SUSE Linux Enterprise Server (SLES): 15 SP5, 15 SP6, 15 SP7
SUSE Linux Enterprise High Availability Extension: 15 SP5, 15 SP6, 15 SP7
SUSE Linux Enterprise Server for SAP Applications: 15 SP5, 15 SP6, 15 SP7
SUSE Linux Enterprise High Performance Computing: 15 SP5
openSUSE Leap: 15.5, 15.6
Supported Architectures: x86_64, aarch64 (ARM64), ppc64le (Power), s390x (IBM Z), i586 (Leap 15.5 only).
Step-by-Step Patch Installation Guide
Apply this critical update immediately using your preferred SUSE management method:
Recommended Methods:
YaST Online Update: Use the graphical interface for seamless patching.
zypper patch: The standard command-line method for applying all relevant security and stability updates.
Product-Specific Commands:
SLE HA 15 SP5:
zypper in -t patch SUSE-SLE-Product-HA-15-SP5-2025-2774=1SLE HA 15 SP6:
zypper in -t patch SUSE-SLE-Product-HA-15-SP6-2025-2774=1SLE HA 15 SP7:
zypper in -t patch SUSE-SLE-Product-HA-15-SP7-2025-2774=1openSUSE Leap 15.5:
zypper in -t patch SUSE-2025-2774=1openSUSE Leap 15.6:
zypper in -t patch openSUSE-SLE-15.6-2025-2774=1
Always verify successful installation using zypper patches or rpm -q cluster-glue to confirm the updated package version (1.0.12+v1.git.1650454062.1fbde71c-150500.4.3.1) is present. Schedule cluster maintenance windows appropriately.
Updated Package List (Version 1.0.12+v1.git.1650454062.1fbde71c-150500.4.3.1)
The following packages are updated across all affected products and architectures (aarch64, ppc64le, s390x, x86_64, i586 for Leap 15.5):
cluster-gluecluster-glue-develcluster-glue-libscluster-glue-debuginfocluster-glue-debugsourcecluster-glue-devel-debuginfocluster-glue-libs-debuginfo
(Note: Package lists are consistent across SLE HA 15 SP5/SP6/SP7 and openSUSE Leap 15.5/15.6 for this update).
Deep Dive: The Critical Role of STONITH in SUSE HA Clusters
STONITH agents are non-negotiable components in SUSE Pacemaker high-availability clusters. They guarantee a faulty node is completely powered off or isolated before resources are moved elsewhere.
The EC2 STONITH agent specifically controls AWS instances via API commands. Failure in this mechanism – such as the instance ID misidentification or inability to issue power-off commands addressed by bsc#1247543 – directly threatens the integrity and availability of clustered services like SAP HANA, SAP NetWeaver, or critical databases.
This update exemplifies SUSE's commitment to providing enterprise-grade resilience for cloud-native workloads. For a comprehensive guide on configuring STONITH, refer to the SUSE HA documentation.
Frequently Asked Questions (FAQ) About This Update
Q: How urgent is this
cluster-glueupdate?A: Rated Important. If you use AWS EC2 STONITH fencing in any affected SUSE HA cluster, applying this patch is highly recommended immediately to prevent potential cluster failures during node outages.
Q: Does this affect clusters not using AWS or EC2 STONITH?
A: The core fix is specific to the EC2 STONITH agent. However,
cluster-glueis a foundational component of SUSE HA stacks. Keeping all components patched ensures overall stability and security.
Q: What are the risks of not applying this patch?
A: The primary risk is unreliable fencing during an AWS instance failure. This could lead to a split-brain scenario, causing application crashes, data corruption, and extended downtime.
Q: Where can I find the original bug report?
A: The bug tracked by bsc#1247543 is detailed on the SUSE Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1247543.
Q: Are there dependencies or special steps after installing?
A: Generally, no special steps beyond restarting the cluster (
crm cluster restart) or relevant resource agents if they were using the EC2 STONITH device. Always test fencing post-update.
Conclusion & Critical Action: Secure Your Cloud Clusters Today
This SUSE update (SUSE-RU-2025:02774-1) delivers a crucial fix that enhances the reliability and resilience of AWS-based high-availability clusters managed by SUSE Linux Enterprise High Availability Extension or openSUSE Leap.
By resolving the EC2 instance ID retrieval and implementing robust retry logic within the STONITH agent, SUSE significantly mitigates a key risk point for cloud deployments.
Proactively patching your cluster-glue package is a fundamental step in safeguarding mission-critical SAP, HPC, and enterprise application availability.
Next Steps:
Identify all affected systems in your inventory.
Schedule maintenance windows.
Apply the update using
zypper patchor the specific commands provided.Verify successful installation and test EC2 STONITH functionality.
Don't leave your cloud cluster's integrity to chance – patch now. Explore SUSE's HA solutions for maximum uptime.

Nenhum comentário:
Postar um comentário