New AI Training Timings: Google TPU v5p Launches, Competes with Nvidia GPU



Maximize AI Potential – High-Speed GPU Servers Up to 50% OFF! Order Now!

At yesterday's Google Cloud Next 2024 conference, Google published a plethora of AI training models and product bombshells: Gemini 1.5 Pro went public with audio processing capabilities; the CodeGemma model was introduced; the first in-house Arm CPU processor, Axion, officially challenged Microsoft and Amazon; Google TPU v5p went online; A3 Mega VM with Nvidia GPU. This time, Google aims to leverage quantity for victory.

The Google Next 2024 conference was inspiring, with Google dropping several bombs one after another.

1. Upgraded Imagen 2.0 with video versions, leading to a melee of train AI video models.
2. The Gemini 1.5 Pro, previously overshadowed by Sora, is now officially open.
3. The first Arm CPU was released, squarely targeting Microsoft/Amazon/Nvidia/Intel.

Additionally, Google's AI supercomputing platform underwent significant upgrades - the most powerful Google TPU v5p went online, along with software storage upgrades, and a more flexible consumption model, all further enhancing Google Cloud's competitiveness in the AI field.

Google, which keeps releasing new products in succession, will not retreat in this AI battle.

What is the Arm CPU: Axion?

This CPU processor, Axion, is said to offer better performance and energy efficiency than Intel CPUs, with a 50% performance increase and 60% energy efficiency improvement claimed. Reportedly, Axion's performance is 30% higher than the fastest existing Arm-based general-purpose chip. With this new weapon, Google is officially challenging Microsoft and Amazon in the AI arms race!

The new Arm CPU, Axion, is Google's response to Amazon AWS and Microsoft Azure - it also wants to develop its own processors. Axion will help Google improve the performance of general workloads, such as open-source databases, web and application servers, memory caches, data analysis engines, media processing, and AI training.

With this, Google has taken another step forward in developing new computing resources. Axion will be available for cloud services later this year.

Importance of Arm CPU on AI Training Acceleration

In the AI training race, CPUs like Axion are crucial because they enhance the computing power required to train AI models. Training complex AI models requires handling large datasets, and CPUs help run these datasets faster. Arguably, the biggest benefit of this move is cost savings undoubtedly! It's well known that the cost of purchasing AI GPU chips is staggering, with Nvidia's Backwell chip expected to retail between $30,000 and $40,000. Now, the Axion chip is already powering YouTube ads and the Google Earth engine. Moreover, it will soon be available for use in Google Compute Engine, Google Kubernetes Engine, Dataproc, Dataflow, Cloud Batch, and other cloud services.

Furthermore, customers already using Arm CPU processors can easily migrate to Axion without the need for re-architecting or rewriting applications.

Collaboration and Competition between Nvidia GPU and Google

At the Google Cloud Next 2024 conference, Google announced a massive upgrade to its in-house supercomputing platform! Leading the upgrade list is Google Cloud's tensor processing unit TPU v5p. Now, this custom chip is open to cloud customers.

Google TPU has always served as an alternative to Nvidia GPUs for accelerating AI tasks. As the next-generation accelerator, Google TPU v5p is specifically designed for training some of the largest and most demanding generative AI models. Each TPU v5p pod contains 8,960 chips, twice as many as the TPU v4 pod.

Furthermore, Google Cloud will collaborate with Nvidia to accelerate AI development - launching the all-new A3 Mega VM virtual machine equipped with Nvidia H100, featuring up to 800 billion transistors on a single chip. Additionally, Google Cloud will integrate Nvidia's latest nuclear bomb GPU, Blackwell, into its products to enhance support for high-performance computing and AI workloads, particularly in the form of virtual machines supported by B200 and GB200. Among these, the B200 is designed for "the most demanding AI, data analytics, and HPC workloads." The liquid-cooled GB200 will provide computing power for real-time LLM inference of trillion-parameter models and large-scale AI training.

While trillion-parameter models are still relatively few (a few selected players are SambaNova and Google's Switch Transformer), Nvidia and Cerebras are both gearing up for trillion-parameter model hardware. Obviously, they have foreseen that the scale of AI models will expand rapidly.

Rent TPU V5P or Nvidia GPU Servers?

The cost of purchasing high-end AI Nvidia GPU chips remains high, such as the tens of thousands of dollars Nvidia Backwell chip or H100. However, renting TPU v5p is also expensive for most people, at $4.2/chip/hour, which translates to $4.2x24x30=$3024 per month. Even though TPU v5p performs excellently, this expenditure is still a significant cost for small and medium-sized enterprises and individuals training AI models.

To train smaller AI models or other tasks requiring high computing power, such as Android emulator multi-opening, video rendering, live broadcasting, rent GPU servers remains an excellent choice. GPU-Mart offers up to 31 GPU server plans, from Nvidia Geforce GPU to Nvidia Tesla GPU server, with 18 years of experience and 24/7 professional technical support.

GPU Server for Android emulator multi-opening, video rendering, and live broadcasting: Nvidia GTX 1660 and GeForce RTX 2060 etc.

GPU Servers for AI Training: Nvidia RTX 4060, Nvidia RTX 4090 and Nvidia A40.

The Google Cloud Next 2024 conference showcased Google's relentless pursuit of innovation and advancement in the realm of AI and computing technologies. With the unveiling of Gemini 1.5 Pro, Axion, TPU v5p, and collaborations with Nvidia GPU, Google has demonstrated its commitment to driving the industry forward.

These developments underscore the pivotal role of CPUs and accelerators in the AI arms race, emphasizing the importance of optimizing computing resources for efficient AI training and deployment. As Google continues to push the boundaries of technological possibilities, the future of AI development holds promising prospects for businesses and individuals alike.

Google's First In-House Arm CPU Axion, Collaborating with Nvidia GPU, Declares War on Microsoft and Intel!

What is the Arm CPU: Axion?

Importance of Arm CPU on AI Training Acceleration

Collaboration and Competition between Nvidia GPU and Google

Rent TPU V5P or Nvidia GPU Servers?