TeraNova

TeraNova

Infrastructure, companies, and the societal impact shaping the next era of technology.

Plain-English reporting on AI, semiconductors, automation, robotics, compute, energy, and the future of work.

Society Companies Explainers Deep Dives About

The New AI Compute Stack: What It Enables, and Where It Breaks

AI hardware is moving fast, but the real story is not just faster chips. It is the shifting balance between training, inference, memory bandwidth, networking, power, and the data center floor itself. The result is a compute stack that unlocks new model capabilities while exposing new bottlenecks at every layer.

AI hardware is evolving so quickly that even the language used to describe it is starting to lag behind the market. Not long ago, the conversation was mostly about GPUs as the default engine for training large models. Today, the real picture is broader and more complicated: custom accelerators, higher-bandwidth memory, denser interconnects, advanced packaging, liquid cooling, and power delivery all shape what AI systems can do.

The important shift is this: AI infrastructure is no longer a single product category. It is a stack. And each layer of that stack can accelerate performance—or become the bottleneck that caps it. That is why the most useful way to think about AI hardware is not as a race for the biggest chip, but as a set of tradeoffs between flexibility, efficiency, scale, and deployability.

Training and inference are now different hardware problems

One reason AI hardware is changing so fast is that the market has split into two distinct workloads. Training is the expensive, power-hungry process of building a model. Inference is the act of running that model in production, whether for chat, search, coding, voice, or enterprise automation.

Training still rewards large, programmable GPU clusters because the workload is dominated by parallel math, and the software stack around CUDA and similar frameworks is mature. But inference has different economics. Once a model is trained, operators care less about raw peak throughput and more about cost per token, latency, batching efficiency, and power use. That opens the door to a wider mix of hardware, including custom ASICs, lower-power GPUs, CPU-heavy serving architectures, and edge devices.

This split matters because it means the “best” chip is increasingly workload-specific. A platform optimized for frontier-model training may be wasteful for high-volume inference. A lean inference accelerator may be too constrained for rapid experimentation or model development. Vendors that can cover both ends of the market have an advantage, but even they must make tradeoffs in memory capacity, programmability, and software support.

Memory bandwidth is becoming as important as compute

The old chip buyer instinct was simple: more FLOPS means more performance. AI has made that formula incomplete. Large models move enormous amounts of data in and out of memory, and that means bandwidth often matters as much as raw arithmetic horsepower.

This is why high-bandwidth memory, or HBM, has become one of the most strategically important parts of the AI hardware stack. HBM lets accelerators access data much faster than traditional memory setups, which improves utilization and reduces idle time. In practice, a chip with massive compute but insufficient memory bandwidth can underperform a more balanced part that keeps its math units fed.

The constraint is not only technical but industrial. HBM supply is limited, packaging is complex, and yields can affect availability across entire product lines. As a result, AI hardware competition is increasingly shaped by memory access, not just transistor counts. For buyers, that means capacity planning is now tied to memory configuration and packaging strategy, not only to silicon benchmarks.

Networking now sets the ceiling for cluster performance

As AI clusters scale from a few servers to thousands of accelerators, networking becomes a first-order concern. Inside a rack or across a data hall, chips must coordinate constantly. If interconnects are too slow, the cluster spends too much time waiting on synchronization rather than doing useful work.

That is why vendors are investing heavily in faster links, tighter switching fabrics, and more efficient collective communication. Technologies such as NVLink, InfiniBand, and advanced Ethernet-based designs exist to reduce the penalty of distributing a workload across many devices. The practical result is that the value of a compute node depends not only on the chip inside it, but on how well it communicates with the rest of the system.

This introduces a major deployment tradeoff. Highly integrated, proprietary networking can deliver excellent performance for a single vendor’s full-stack system, but it may lock buyers into a narrower ecosystem. More open Ethernet-based architectures can be easier to source and operate at scale, but they may sacrifice some performance efficiency. In other words, cluster design is now a software and networking decision as much as a chip decision.

Packaging and cooling have moved from backroom details to strategic constraints

As chips get larger and more power-dense, traditional packaging approaches stop being enough. Advanced packaging techniques—such as chiplets, 2.5D integration, and stacked memory—allow designers to combine multiple dies and memory elements into one high-performance module. This helps overcome limits in monolithic chip design and improves time-to-market for complex products.

But these gains come with real manufacturing complexity. More sophisticated packaging can raise costs, introduce supply chain bottlenecks, and increase failure sensitivity. In AI, where the customer often wants massive scale quickly, packaging can become the difference between theoretical performance and actual shipments.

Cooling is the other side of the same problem. A modern AI server can draw so much power that air cooling alone becomes inefficient or impractical. Liquid cooling, rear-door heat exchangers, and redesigned power distribution are increasingly necessary to keep dense racks within operating limits. This is not a cosmetic upgrade. It determines whether a data center can support the next generation of accelerators at all.

That means AI hardware is colliding with physical infrastructure. The best chip in the world is of limited value if a site cannot deliver enough power, reject enough heat, or support the required rack density. For operators, the bottleneck is often no longer semiconductors alone, but the entire building around them.

GPUs still dominate, but the center of gravity is shifting

GPUs remain the most important platform in AI because they combine flexibility, scale, and an established software ecosystem. They are the safest choice for organizations that want to train models, fine-tune them, and adapt quickly as workloads change.

But the market is clearly diversifying. Cloud providers, hyperscalers, and large model developers increasingly want more control over cost and performance. That is driving custom silicon programs, including domain-specific accelerators optimized for training or inference. These chips can be more efficient than general-purpose GPUs, but they also come with higher development risk and narrower applicability.

The tradeoff is straightforward. GPUs offer broad compatibility and fast iteration. ASICs can offer better economics at scale, but only when the workload is stable enough to justify the design effort. For a large platform company running billions of inferences a day, a custom chip can be a strategic advantage. For a startup or enterprise team still changing model architectures every quarter, the GPU remains the better insurance policy.

What buyers should watch next

For buyers, operators, and investors, the key question is no longer simply which chip is fastest. It is which part of the stack is constraining performance, cost, or deployment speed.

Three issues deserve close attention. First, memory and packaging will continue to influence real-world throughput more than headline compute numbers suggest. Second, networking will increasingly define the scalability of training clusters and distributed inference systems. Third, power and cooling will decide where AI can be deployed, especially as rack densities climb.

These constraints create a useful filter. If a system promises exceptional performance but requires exotic infrastructure, it may be appropriate only for hyperscale buyers with deep capital and dedicated facilities. If a system is easier to deploy but less efficient, it may win in smaller environments where operational simplicity matters more than peak throughput.

That is the deeper story of AI hardware right now. The market is not merely producing faster chips. It is reorganizing compute around the physical limits of memory, networking, power, and thermals. The winners will be the architectures that fit the workload and the building—not just the benchmark.

Image: Gain induit CPU- GPU- TRI2.JPG | printed screen of my own statistique from http://boincstats.com/stats/boinc_user_graph.php?pr=bo&id=1210 | License: CC BY-SA 4.0 | Source: Wikimedia | https://commons.wikimedia.org/wiki/File:Gain_induit_CPU-_GPU-_TRI2.JPG

About TeraNova

This publication covers the infrastructure, companies, and societal impact shaping the next era of technology.

Featured Topics

AI

Models, tooling, and deployment in the real world.

Chips

Semiconductor strategy, fabs, and supply chains.

Compute

GPUs, accelerators, clusters, and hardware economics.

Robotics

Machines entering warehouses, factories, and field work.

Trending Now

Future Sponsor Slot

Desktop sidebar ad or house promotion