Aviz ONES Enhances Spectrum-X Network Visibility and Performance
Aviz ONES provides enterprises with comprehensive visibility into NVIDIA Spectrum-X and Cumulus Linux environments, focusing on real-time Graphics Processing Unit (GPU) and Remote Direct Memory Access (DMA) (RDMA) insights along with agentless telemetry and proactive alerts. This update is relevant for IT decision-makers managing Artificial Intelligence (AI) workloads.
Network Visibility
In contemporary AI-centric data centers, thorough network observability is recognized as vital. Spectrum-X platforms necessitate visibility from switches to GPUs, which aids in latency reduction, early issue detection, and performance optimization for AI/ML tasks.
Advantages of Aviz ONES for Spectrum-X
Aviz ONES integrates network monitoring across NVIDIA Spectrum-X and Cumulus Linux, ensuring scalable, agentless visibility across AI environments.
Key features include:
- Agentless telemetry from Spectrum-X switches through NVIDIA NVUE.
- Real-time insights into hardware, traffic stats, and protocol conditions.
- Visibility into AI/ML topologies and RDMA over Converged Ethernet (RoCE) traffic, essential for GPU communication.
- Integration with multi-vendor fabrics, supporting Cumulus Linux.
Support for Cumulus Linux
ONES offers extensive agentless telemetry streaming compatible with current Cumulus Linux versions.
Utilizing NVIDIA NVUE and NGINX, Aviz ONES provides:
- Real-time monitoring with minimal performance impact.
- Insights into device metrics and protocol statuses.
- Consistent integration with Spectrum-X switches across different software versions.
Benefits for Network Teams
Aviz ONES aids network teams in optimizing AI workloads and enhancing issue resolution through unified telemetry and automation.
Teams can:
- Enhance AI training with congestion control and fabric balancing.
- Maintain tenant isolation with precise telemetry.
- Accelerate troubleshooting by correlating various metrics.
- Increase network and storage throughput visibility.
Enhanced Network Reliability with Rule Engine
ONES includes a rule engine that automates alerts and detects anomalies for effective network management.
This component monitors Central Processing Unit (CPU) and memory usage, configuration integrity, and device status, facilitating prompt issue resolution.
Advanced Capabilities for Monitoring
Aviz ONES also provides advanced management features for AI networks.
It includes:
- Real-time monitoring for essential protocols.
- Configuration management with various workflows.
- Monitoring GPU health metrics.
- Traffic analytics with detailed metrics.
- Orchestration across multi-vendor, multi-fabric environments.
Conclusion
Aviz ONES delivers unified visibility, automation, and intelligent monitoring for Spectrum-X operations. Organizations running AI and Machine Learning (ML) applications benefit from real-time visibility, faster troubleshooting, and enhanced reliability across complex network environments.