NVIDIA Slashes AI Costs by 35x with GB300 NVL72 Blackwell Ultra Platform
NVIDIA unveils GB300 NVL72 Blackwell Ultra platform delivering 35x lower AI costs and 50x higher throughput per megawatt for agentic AI and coding workloads.
NVIDIA has claimed that its GB300 NVL72 platform which is based on the Blackwell Ultra architecture is capable of lowering the total cost of ownership (TCO) of agentic AI and coding assistant workloads by 35 times over the past generations. The breakthrough is that the need to develop autonomous AI agents and coding copilots is increasing at a faster rate in industries.
GB300 platform provides 35 times less cost per token when running low-latency inference workloads as often needed by interactive AI agents. It is also 50 times more throughput and optimised per Megawatt over the Hopper platform at NVIDIA, which limits AI data centres much more in terms of energy efficiency. When it comes to the long-context scenarios, 128,000-token input scenarios, GB300 is 1.5x cheaper per token input than the previously used GB200 system.
Early adopters say as much as 10x in token economics. Major cloud providers like Microsoft Azure, Oracle Cloud Infrastructure and CoreWeave are deploying the system as well as inference companies like Together AI, Fireworks AI, Baseten, and DeepInfra.
More efficiency is also achieved by NVIDIA with its Dynamo open-source inference framework. In the future, this next Vera Rubin server is expected to provide a 10x gain in throughput-per-megawatt.
NVIDIA aims to reduce AI Surge 35x With agentic AI
The fast development of the agentic AI systems has transformed computing requirements. The agentic coding questions have in the past year increased by 11 to almost 50 percent of software related AI requests indicating a change in structure in AI workloads. The GB300 NVL72 platform by NVIDIA is a direct response to this change.
The system significantly reduces cost by providing 35 times the price of inference with low-latency to afford interactive AI agents that need responses within seconds. These computational loads require not only high speed but also a high sustained computational throughput especially in assistants to the coders and autonomous systems that address complex queries.
The GB300 also records 50x greater throughput/per megawatt than the Hopper generation and redefines AI data centre efficiency. This is because AI factories can produce significantly more tokens with the same amount of energy input thus reducing operational cost and environmental influence. As early adopters are already reporting 10x improvements in token economics, NVIDIA enables Blackwell Ultra to be a platform on which next-generation AI agents can be scaled efficiently and in a sustainable manner.
Cloud Giants, AI Specialists, Live Deploy GB300 at Scale
Large cloud providers are fast adopting the GB300 NVL72 platform into their production. The systems are being deployed by Microsoft Azure, Oracle Cloud Infrastructure (OCI) and CoreWeave to support low-latency interactive AI and long-context workloads. It means that there is confidence among enterprises on the economic and performance benefits of the platform.
Other inference specialists like Together AI, Fireworks AI, Baseten and DeepInfra are also using GB300 systems to optimise large-scale AI inference. These implementations reflect a wider trend in the industry to generate tokens efficiently and scaleable AI infrastructure.
Along with this, NVIDIA open-source inference engine Dynamo improves performing throughput within the Blackwell stack, and minimises operation costs even more. NVIDIA empowers its ecosystem strategy by integrating hardware innovation and optimization of software. The synchronised deployment among hyperscalers and AI native companies is an indication that the world is moving to a model of AI deployment that is more energy efficient and cost effective.
Vera Rubin Platform Signals New Platform Leap
Although the Blackwell Ultra is a major leap of progress, NVIDIA is already pointing to the next step in architecture. The next generation Vera Rubin system, projected to be ready approximately 20262027, is predicted to achieve 10x the throughput per megawatt of Blackwell.
This future roadmap is in line with the long-term approach of NVIDIA to continue being a leader in AI infrastructure. Computational intensity will increase significantly as AI models become larger in scale and agentic systems become more autonomous. The improvement of throughput per megawatt will directly solve energy limitations and cost scaling issues of AI data centres.
Even the current GB300 which is already showing enhancements over the GB200 by having a 1.5x superiority over the GB200 in long-context use with 128,000 tokens of input already. This is the direction that Vera Rubin wants to take it. As AI usage in the industrial field grows, the performance improvements provided by NVIDIA in an iterative manner may redefine the approach to token economics, cloud architecture, and the strategy of AI implementation by businesses in ten years.