Baron Fung Reports on How GTC 2026 Signals the Next Phase of AI Infrastructure

At NVIDIA GTC 2026, the note argues that Artificial Intelligence (AI) infrastructure is moving from accelerated computing toward heterogeneous, workload-specific architectures, with optimization becoming the central design constraint for data center systems.

It links this shift to changes in hyperscaler workloads from retrieval to Generative AI (GenAI) and toward reasoning-driven designs, while citing rising data center Capital Expenditure (CAPEX) projections as pressure on system economics.

Market Overview

The analyst frames the industry as progressing beyond early accelerated-computing deployments toward heterogeneous infrastructure optimized for different workloads.

It also ties demand to data center capital spending, citing Dell’Oro Group projections that global data center CAPEX will exceed $1.7T by 2030.

Key Findings

The report states that hyperscaler workloads evolved from retrieval-based systems toward GenAI and increasingly toward reasoning-driven architectures.

It adds that internal workloads such as search are being re-architected around AI models, and that this continues to support demand for accelerated computing.

LPUs and Workload Specialization

The note highlights LPUs, referencing NVIDIA’s partnership with Groq, and characterizes their SRAM-based design as optimized for low latency and performance per watt.

It says this can target lower cost per token for inference and reasoning workloads, while also allowing service-tier optimization through different throughput- and latency-oriented configurations.

Deployment Signals and Open Questions

The analyst reports that early deployments can configure LPUs at meaningful density, citing an example where a single Groq LPU rack can integrate hundreds of processors.

It also states that LPUs may be deployed alongside Graphics Processing Unit (GPU) clusters and notes that it remains unclear whether they will complement GPUs or displace portions of certain workloads as operators optimize overall system efficiency.

GPU Density and System Scaling

The note says NVIDIA continues to push GPU density and integration, citing Vera Rubin Ultra as an example with multi-die architectures and terabyte-scale High Bandwidth Memory (HBM) capacity per package.

It also references future platforms such as Feynman and states that scaling raises constraints around power, cooling, and system balance, increasing the role for complementary architectures and specialized components.

Interconnect Strategy and Fabric Evolution

The report describes NVIDIA’s interconnect strategy as balancing InfiniBand and Ethernet scale-out connectivity with NVLink as a scale-up backbone.

It states that as scale-up grows, NVLink may need to extend into optical domains and that scale-up and scale-out architectures would need to evolve in parallel for resilience, workload distribution, and cluster utilization.

Networking Constraints and System Integration

The analyst identifies connectivity as a constraint in next-generation AI infrastructure, stating that systems now rely largely on 200 Gbps Serializer/Deserializer (SerDes) while the industry looks toward 400 Gbps SerDes.

It says the transition to 400 Gbps faces signal integrity, power consumption, and packaging complexity challenges, and that vertically integrated control over InfiniBand technology can support changes when standards lag system requirements.

DPUs, Smart NICs, and Data Movement Roles

The note says smart NICs and DPUs are becoming central to system architecture, and it cites a market projection for Ethernet-controller-adapter-related products growing at a 30% Compound Annual Growth rate (CAGR) over the next five years.

It states that DPUs such as NVIDIA’s BlueField platform expand beyond cluster interconnects to manage data movement between compute, storage, and Central Processing Unit (CPU) domains while offloading networking and orchestration tasks.

Full-Stack Platform Expansion

The analyst says NVIDIA extends beyond GPUs to other data center stack components, citing the dense Vera CPU platform for orchestrating agentic AI workloads and STX for KV cache-based context memory.

It links these items to co-design across compute, networking, and storage, positioning the approach as unified system-level architecture rather than isolated component optimization.

Software and Economic Considerations

The report says sustaining returns on investment depends on hardware performance and utilization over time as systems become more complex and capital intensive.

It characterizes CUDA as providing continuity across generations, enabling incremental performance improvements and mixed-generation deployments that improve Total Cost of Ownership (TCO).

This Analyst Signals brief reflects the report’s overall message that AI infrastructure is moving from scale toward optimization through heterogeneous, co-designed architectures shaped by evolving workloads and high data center CAPEX expectations. This Analyst Signals brief reflects a neutral, fact-based summary of the original research note.