UPDATED 15:04 EDT / MAY 25 2021

EMERGING TECH

Arm debuts its fastest CPUs to date for mobile and ‘internet of things’ devices

Arm Holdings Ltd., the chip designer whose semiconductor blueprints power most smartphones along with countless other devices, today introduced a new generation of central processing units and related components under the Total Compute brand.

Total Compute silicon is designed for use in so-called client devices. The term encompasses most of everything from smart home appliances to phones and laptops, though it’s the latter two product categories Arm is primarily emphasizing with the new products. Highlights of the announcement include a high-end CPU with features for predicting future computation results, a faster chip cache module and a series of improved graphics processing units.

The Cortex-X2 is Arm’s new flagship CPU core design for phones and laptops. In the mobile market, Arm expects that Android phones based on the Cortex-X2 will offer up to 30% more performance than current handsets. When implemented in a laptop, Arm says the chip can provide a more than 40% speed improvement over 2020 laptops.

The performance boost is the result of several improvements. To start, Arm has enhanced the accuracy of the chip’s branch prediction mechanism, a component inside CPUs that guesses the most likely result of computing operations in advance to save time and speed up processing. The increased accuracy translates into fewer incorrect guesses and, consequently, increased performance.

There’s also a new instruction pipelining component, which splits up processing tasks into multiple smaller operations. This makes it possible to distribute operations more evenly across a processor to boost utilization. Arm says that it has reduced the number of clock cycles necessary for the task from 11 to 10, which adds up quickly across the millions of operations a CPU performs per second.

Yet another contributor to the Cortex-X2’s speed is the fact that it uses Armv9, the latest iteration of the foundational architecture underpinning Arm’s chip designs.

Armv9 is also the basis of the Cortex-A710, another new CPU core introduced today. The Cortex-A710 includes the same branch prediction optimizations as the Cortex-X2. The two chips both target mobile devices but target different use cases. The former is designed to maximize a device’s peak speed, while the latter places the emphasis on optimizing sustained performance.

The Cortex-A710, with its focus on sustained performance, is 10% faster than its predecessor and 30% more efficient. Arm has also doubled the speed at which it can run machine learning apps.

“The Cortex-X series is designed to maximize performance on single-threaded and “bursty” workloads. The pipeline in the microarchitecture is structured and provisioned to push IPC [instructions per clock cycle] performance improvements,” explained Aditya Bedi, Arm’s director of product management. “The Cortex-A700 series is prioritized for sustained multiprocessor workloads, with the best balance of efficiency and performance.”

The chip inside an Android phone often has multiple cores: so-called “big” cores for running the most demanding apps and slower “little” cores that consume less power. The Cortex-A710 is designed to function as the big core in phone chips. It’s accompanied by a new little core design, the Cortex-A510, that Arm said today will provide 35% faster performance than its predecessor. The company is also touting a threefold acceleration in machine learning speeds.

CPU ML performance

The Cortex-A510 is particularly notable because it marks the first refresh of Arm’s smartphone little-core design in four years. According to the company, the core introduces several innovations, including the ability to share the L2 cache with another processor core inside a device.

The L2 cache is one of the memory banks in which a chip stores the data it processes. Sharing a memory bank between two cores means they can share a single copy of data instead of storing two separate copies in separate caches, which increases efficiency.

“These performance improvements are important, as it gives Cortex-A510 a bigger operating range and raises the ‘performance floor’ to meet the growing performance demands across multiple consumer device markets,” Bedi detailed. “This means workloads can run longer on the ‘little’ CPUs before switching to the ‘big’ CPUs.”

The L2 cache is one of several caches in a CPU. There’s also the L3 cache, an auxiliary memory bank that supports the L2 unit. Alongside the next-generation CPU core designs, Arm today introduced an L3 unit dubbed the DSU-110 with an increased maximum capacity of 16 megabytes and five times more memory bandwidth.

The British chip designer also shared details about several new additions to its mobile GPU line. The fastest, the Arm Mali-G710, promises a 20% performance improvement for high-end smartphones and Chromebooks. The lower-end Arm Mali-G610, Mali-G510 and Mali-G310 target less powerful devices ranging from wearables to midrange phones.

Photo: Arm

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU