DDN launches Inferno, improving AI inference performance
DDN has introduced Inferno, an inference acceleration appliance designed to improve Artificial Intelligence (AI) performance with response times under one millisecond and reduced compute costs. Announced at NVIDIA's GTC 2025, Inferno aims to resolve challenges related to latency and cost by optimizing Graphics Processing Unit (GPU) utilization and facilitating multimodal AI data pipelines. The device leverages DDN's high-performance architecture combined with NVIDIA Spectrum-X's AI-optimized networking, allowing enterprises to enhance their AI workflows and scale infrastructure efficiently. Early testing reveals a performance boost that includes a reported 10x lower latency compared to existing solutions, promising instant decision-making critical for various applications. Inferno also claims a 12x increase in cost-efficiency when compared to AWS S3-based inference stacks, which is significant for businesses looking to manage operational expenses. The technology is engineered to maximize GPU usage to 99%, thereby eliminating data bottlenecks and ensuring that AI workloads can be processed effectively. The appliance accommodates a range of AI workloads, including language models and real-time analytics, and is deployable in on-premises (on-prem), cloud, or hybrid environments. DDN's leadership indicates that Inferno's capabilities are geared towards simplifying data integration and accelerating inference processing. Omar Orqueda, SVP of Infinia Engineering at DDN, emphasized the importance of removing barriers between data and intelligence, noting that Inferno provides advanced inference acceleration while also lowering costs for enterprises. This development may influence how organizations adopt AI technologies moving forward by providing a more integrated and efficient framework for processing AI workloads.