Last week the gang at Anandtech posted a story discovering systematic cheating by Huawei in smartphone benchmarks. In its story, AT focused on 3DMark and GFXBench, looking at how the Chinese-based silicon and phone provider was artificially increasing benchmark scores to gain an advantage in its battles with other smartphone providers and SoC vendors like Qualcomm.
As a result of that testing, UL Benchmarks (who acquired Futuremark) delisted several Huawei smartphones from 3DMark, taking the artificial scores down from the leaderboards. This puts the existing device reviews in question while also pulling a cloud over the recently announced (and impressive sounding) Kirin 980 SoC meant to battle with the Snapdragon 845 and next-gen Qualcomm product. The Kirin 980 will be the first shipping processor to integrate high performance Arm Cortex-A76 cores, so the need to cheat on performance claims is quesetionable.
Just a day after this story broke, UL and Huawei released a joint statement that is, quite honestly, laughable.
"In the discussion, Huawei explained that its smartphones use an artificial intelligent resource scheduling mechanism. Because different scenarios have different resource needs, the latest Huawei handsets leverage innovative technologies such as artificial intelligence to optimize resource allocation in a way so that the hardware can demonstrate its capabilities to the fullest extent, while fulfilling user demands across all scenarios.
To somehow assert that any kind of AI processing is happening on Huawei devices that is responsible for the performance differences that Anandtech measured is at best naïve and at worst straight out lying. This criticism is aimed at both Huawei and UL Benchmarks – I would assume that a company with as much experience in performance evaluation would not succumb to this kind of messaging.
After that AT story was posted, I started talking with the team that builds Geekbench, one of the most widely used and respected benchmarks for processors on mobile devices and PCs. It provides a valuable resource of comparative performance and leaderboards. As it turns out, Huawei devices are exhibiting the same cheating behavior in this benchmark.
Below I have compiled results from Geekbench that were run by developer John Poole on a Huawei P20 Pro device powered by the Kirin 970 SoC. (Private app results, public app results.) To be clear: the public version is the application package as downloaded from the Google Play Store while the private version is a custom build he created to test against this behavior. It uses absolutely identical workloads and only renames the package and does basic string replacement in the application.
Clearly the Huawei P20 Pro is increasing performance on the public version of the Geekbench test and not on the private version, despite using identical workloads on both. In the single threaded tests, the total score is 6.5% lower with the largest outlier being in the memory performance sub-score, where the true result is 14.3% slower than the inaccurate public version result. Raw integer performance drops by 3.7% and floating-point performance falls by 5.6%.
The multi-threaded score differences are much more substantial. Floating point performance drops by 26% in the private version of Geekbench, taking a significant hit that would no doubt affect its placement in the leaderboards and reviews of flagship Android smartphones.
Overall, the performance of the Huawei P20 Pro is 6.5% slower in single threaded testing and 16.7% slower in multi-threaded testing when the artificial score inflation in place within the Huawei customized OS is removed. Despite claims to the contrary, and that somehow an AI system is being used to recognize specific user scenarios and improve performance, this is another data point to prove that Huawei was hoping to pull one over on the media and consumers with invalid performance comparisons.
Some have asked me why this issue matters; if the hardware is clearly capable of performance like this, why should Huawei and HiSilicon not be able to present it that way? The higher performance results that 3DMark, GFXBench, and now Geekbench show are not indicative of the performance consumers get with their devices on real applications. The entire goal of benchmarks and reviews is to try to convey the experience a buyer would get for a smartphone, or anything else for that matter.
If Huawei wanted one of its devices to offer this level of performance in games and other applications, it could do so, but at the expense of other traits. Skin temperature, battery life, and device lifespan could all be impacted – something that would definitely affect the reviews and reception of a smartphone. Hence, the practice of cheating in an attempt to have the best of both.
The sad part about all of this is that Huawei’s flagship smartphones have been exceptional in nearly every way. Design, screen quality, camera integration, features; the Mate and P-series devices have been excellent representations of what an Android device can be. Unfortunately, for enthusiasts that follow the market, this situation will follow the company and cloud some of those positives.
Today’s data shows that the story of Huawei and benchmarks goes beyond just 3DMark and GFXBench. We will be watching this closely to see how Huawei responds and if any kinds of updates to existing hardware are distributed. And, as the release of Kirin 980 devices nears, you can be sure that testing and evaluation of these will get a more scrutinizing eye than ever.