PC REPORT
Friday, November 17, 1997
Netizens: Feel free to distribute far and wide. All I ask is you keep this entire
document intact, including all URLs and copyrights. Thanks.
Hear the PC Report via streaming Java audio at
http://24.3.97.12/PCR.htm#PC%20REPORT%20PLAYBACK
From InformationWeek Online
Red-Handed, Red-Faced, Red Alert
By Rich Levin
Developer Quote Of The Week: "What we do is, given a benchmark, we try to do
as well as we can on it, and make sure that our system is the fastest benchmark --
I mean, fastest system -- in the world." -- Brian Croll, Sun Microsystems' director
of marketing for Solaris
Two weeks ago, Sun Microsystems got caught with its hand in the benchmarking cookie
jar. Or did it?
Depending on your point of view, Sun either grossly misrepresented the performance
of its Solaris Java just-in-time compiler by fooling Pendragon Software's CaffeineMark
performance test, or Sun proved the CaffeineMark is not an acceptable measure of
Java compiler performance.
For those who may have missed it, here's the background: In a Nov. 4 press release,
Ivan Phillips, president of Pendragon Software, in Libertyville, Ill., a developer
of software for personal digital assistants, accused Sun of engineering its new Java
compiler to trick the CaffeineMark into reporting higher performance results.
When Sun's compiler detected a block of 600 bytecodes unique to the CaffeineMark
(a technique known as pattern matching), the compiler bypassed data processing, and
instead returned a value expected by the benchmark.
This fooled the test into reporting performance results 300 times faster than the
compiler would deliver in real-world use. Third-party developers subsequently validated
Phillips' assertion. Interestingly, when Pendragon's engineers altered the test to
appear different to Sun's compiler, the compiler's branching was short-circuited,
and its performance plummeted. Java compilers under Windows 95, Windows NT, and the
Mac OS delivered uniform results under both the original and altered tests.
Sun officials initially admitted no wrongdoing, and were quick to point out that
optimizing software to improve benchmark scores is an accepted practice among computer
technology vendors. "People are optimizing against the benchmark," says
Brian Croll, Sun's director of marketing for Solaris.
Further, Croll maintained that the aberrant results indicate a fundamental flaw in
Pendragon's benchmark suite, and do not represent any impropriety by Sun. "I
don't know how valid the [CaffeineMark] is," Croll said. Then last week, during
a day-long media briefing at Sun's Mountain View, Calif., headquarters, Sun officials
updated their explanation of events. SunSoft president Janpieter Scheerder said the
company was not trying "to do anything malicious;" rather, Sun engineers
simply "optimized too much."
A Sun spokesperson at the event blamed the incident on human error, and said an engineering
prototype somehow found its way through Sun's rigorous (you would think) development
and quality assurance processes, and onto the Web, with documentation, and overblown
press release in tow.
What if Pendragon officials had not discovered Sun's alleged trickery? What if Sun
engineers tweaked their compiler to only improve its score 10-fold, instead of the
eye-popping 300-fold increase that flagged Pendragon officials?
Sun's PR machine had already posted a press release, in which they touted their "new
Web-enhanced Solaris operating environment" as delivering "the world's
fastest Java technology performance." The release also claimed Solaris' compiler
was 50% faster than the best Windows NT score, and cited the CaffeineMark as proof.
If Pendragon officials had not discovered the ruse, Sun's formidable sales and marketing
machine would now be steam-rolling press and IT decision-makers alike, trumpeting
Solaris' performance advantage over Microsoft's Windows NT, waving Sun's illicitly
obtained CaffeineMark results as evidence in hand.
"Any benchmark, no matter what its original purpose, is subject to use as 'benchmarketing,'"
says Larry Gray, board member of the Standard Performance Evaluation Corp. (SPEC),
in Manassas, Va., a consortium that administers many well- known benchmarks. "I'd
guess maybe 20% to 30% of the tricks vendors pull to make the benchmarks run faster
are never seen by real-world applications." (Gray is also a product manager
with Hewlett-Packard's server marketing group in Cupertino, Calif.)
Sun not only spoofed the CaffeineMark; it attempted to spoof developers and IT organizations
around the world. Worse, in its effort to discredit Pendragon's CaffeineMark algorithms,
the company demonstrated it was, initially at least, willing to cast a shadow on
virtually all CaffeineMark results published to date.
It's common practice for technology vendors to optimize their hardware or software
to perform better under benchmark testing. In fact, benchmark suite vendors and consortiums
encourage the practice, believing legitimate optimizations lead to improved real-world
performance for IT users. "If the benchmarks are any good, and a vendor optimizes
their product to improve their scores, they've also found a way to make their product
run better," explains SPEC's Gray.
But there are times-and this is one of them-when an optimization is more correctly
defined as cheating. "If the thing recognizes cases that look like the particular
benchmark, the word 'cheating' is perfectly justified," says Thomas Plum, president
of Plum Hall Inc., in Kamuela, Hawaii, a developer of C/C++ testing and conformance
tools, and the chairman of the International Standards Organization WG21 C++ standards
committee. "Most [experts] consider [pattern matching] as going too far."
Such cheats have no purpose in the never-ending benchmarking and optimization cycle,
Plum says, and serve only to imply a particular technology will deliver performance
it cannot conceivably provide in real-world systems.
If the evidence overwhelmingly indicates Sun cheated this time, it's safe to assume
it may do it again. It's also safe to assume that, next time, the company might not
get caught. For their part, company officials continue to deny any intent to swindle
the public, but the events of the last two weeks should, at a minimum, concern every
consumer of benchmarking data.
Developers, be forewarned: Just as a virus can slip through the best antivirus defenses,
the best benchmarks can and are being spoofed with alarming regularity. Because most
popular benchmarks are managed by industry consortiums or private corporations, companies
with their hands in the cookie jar are rarely flogged in public, and the practices
remain benchmarking's dirty little secret.
Send Us Your Feedback
Rich Levin, KYW Newsradio.
Rich Levin's PC Report
As heard on KYW Newsradio 1060 AM Philadelphia
Listen 2, 3, 4 times a day
A CBS Radio Station
Copyright (C) 1997 Rich Levin, all rights reserved Subscribe at http://24.3.97.12
UGNetwork Channel _ / _ Computer Talk _ / _ Articles
Support our sponsors: The Design & Publishing Center and UGNN. The content in this channel: http://www.user-groups.com/COMPUTER TALK/ is copyright: Computer Talk Radio, Rich Levin, 1997. All Rights Reserved. * This service provided by The User Group Network, and The Design & Publishing Center to serve the user group community. For information about the UGNetwork, to get involved or have your own groups' home page located at user-groups.com, please contact us. Send an e-mail message to: UGNetwork@user-groups.com copyright 1996 - 1997 User Group Network
488.132.126