PC REPORT

Friday, November 17, 1997

Netizens: Feel free to distribute far and wide. All I ask is you keep this entire document intact, including all URLs and copyrights. Thanks.

Hear the PC Report via streaming Java audio at
http://24.3.97.12/PCR.htm#PC%20REPORT%20PLAYBACK

From InformationWeek Online

Red-Handed, Red-Faced, Red Alert
By Rich Levin

Developer Quote Of The Week: "What we do is, given a benchmark, we try to do as well as we can on it, and make sure that our system is the fastest benchmark -- I mean, fastest system -- in the world." -- Brian Croll, Sun Microsystems' director of marketing for Solaris

Two weeks ago, Sun Microsystems got caught with its hand in the benchmarking cookie jar. Or did it?

Depending on your point of view, Sun either grossly misrepresented the performance of its Solaris Java just-in-time compiler by fooling Pendragon Software's CaffeineMark performance test, or Sun proved the CaffeineMark is not an acceptable measure of Java compiler performance.

For those who may have missed it, here's the background: In a Nov. 4 press release, Ivan Phillips, president of Pendragon Software, in Libertyville, Ill., a developer of software for personal digital assistants, accused Sun of engineering its new Java compiler to trick the CaffeineMark into reporting higher performance results.

When Sun's compiler detected a block of 600 bytecodes unique to the CaffeineMark (a technique known as pattern matching), the compiler bypassed data processing, and instead returned a value expected by the benchmark.

This fooled the test into reporting performance results 300 times faster than the compiler would deliver in real-world use. Third-party developers subsequently validated Phillips' assertion. Interestingly, when Pendragon's engineers altered the test to appear different to Sun's compiler, the compiler's branching was short-circuited, and its performance plummeted. Java compilers under Windows 95, Windows NT, and the Mac OS delivered uniform results under both the original and altered tests.

Sun officials initially admitted no wrongdoing, and were quick to point out that optimizing software to improve benchmark scores is an accepted practice among computer technology vendors. "People are optimizing against the benchmark," says Brian Croll, Sun's director of marketing for Solaris.

Further, Croll maintained that the aberrant results indicate a fundamental flaw in Pendragon's benchmark suite, and do not represent any impropriety by Sun. "I don't know how valid the [CaffeineMark] is," Croll said. Then last week, during a day-long media briefing at Sun's Mountain View, Calif., headquarters, Sun officials updated their explanation of events. SunSoft president Janpieter Scheerder said the company was not trying "to do anything malicious;" rather, Sun engineers simply "optimized too much."

A Sun spokesperson at the event blamed the incident on human error, and said an engineering prototype somehow found its way through Sun's rigorous (you would think) development and quality assurance processes, and onto the Web, with documentation, and overblown press release in tow.

What if Pendragon officials had not discovered Sun's alleged trickery? What if Sun engineers tweaked their compiler to only improve its score 10-fold, instead of the eye-popping 300-fold increase that flagged Pendragon officials?

Sun's PR machine had already posted a press release, in which they touted their "new Web-enhanced Solaris operating environment" as delivering "the world's fastest Java technology performance." The release also claimed Solaris' compiler was 50% faster than the best Windows NT score, and cited the CaffeineMark as proof.

If Pendragon officials had not discovered the ruse, Sun's formidable sales and marketing machine would now be steam-rolling press and IT decision-makers alike, trumpeting Solaris' performance advantage over Microsoft's Windows NT, waving Sun's illicitly obtained CaffeineMark results as evidence in hand.

"Any benchmark, no matter what its original purpose, is subject to use as 'benchmarketing,'" says Larry Gray, board member of the Standard Performance Evaluation Corp. (SPEC), in Manassas, Va., a consortium that administers many well- known benchmarks. "I'd guess maybe 20% to 30% of the tricks vendors pull to make the benchmarks run faster are never seen by real-world applications." (Gray is also a product manager with Hewlett-Packard's server marketing group in Cupertino, Calif.)

Sun not only spoofed the CaffeineMark; it attempted to spoof developers and IT organizations around the world. Worse, in its effort to discredit Pendragon's CaffeineMark algorithms, the company demonstrated it was, initially at least, willing to cast a shadow on virtually all CaffeineMark results published to date.

It's common practice for technology vendors to optimize their hardware or software to perform better under benchmark testing. In fact, benchmark suite vendors and consortiums encourage the practice, believing legitimate optimizations lead to improved real-world performance for IT users. "If the benchmarks are any good, and a vendor optimizes their product to improve their scores, they've also found a way to make their product run better," explains SPEC's Gray.

But there are times-and this is one of them-when an optimization is more correctly defined as cheating. "If the thing recognizes cases that look like the particular benchmark, the word 'cheating' is perfectly justified," says Thomas Plum, president of Plum Hall Inc., in Kamuela, Hawaii, a developer of C/C++ testing and conformance tools, and the chairman of the International Standards Organization WG21 C++ standards committee. "Most [experts] consider [pattern matching] as going too far."

Such cheats have no purpose in the never-ending benchmarking and optimization cycle, Plum says, and serve only to imply a particular technology will deliver performance it cannot conceivably provide in real-world systems.

If the evidence overwhelmingly indicates Sun cheated this time, it's safe to assume it may do it again. It's also safe to assume that, next time, the company might not get caught. For their part, company officials continue to deny any intent to swindle the public, but the events of the last two weeks should, at a minimum, concern every consumer of benchmarking data.

Developers, be forewarned: Just as a virus can slip through the best antivirus defenses, the best benchmarks can and are being spoofed with alarming regularity. Because most popular benchmarks are managed by industry consortiums or private corporations, companies with their hands in the cookie jar are rarely flogged in public, and the practices remain benchmarking's dirty little secret.

Send Us Your Feedback

Rich Levin, KYW Newsradio.

Rich Levin's PC Report
As heard on KYW Newsradio 1060 AM Philadelphia
Listen 2, 3, 4 times a day
A CBS Radio Station

Copyright (C) 1997 Rich Levin, all rights reserved Subscribe at http://24.3.97.12



Rich Levin, Senior Editor
InformationWeek magazine -- CMP Media, Manhasset, NY

Reporter - KYW Newsradio 1060 AM (CBS) -- Philadelphia, PA

Host - CBS Radio's Computer Talk -- WPHT Philadelphia 1210 AM

215-887-8281 (24x7 voice)
215-887-5485 (24x7 fax and voice mail)

Web site: http://cc285929-a.wlgrv1.pa.home.com
FTP site: ftp://cc285929-a.wlgrv1.pa.home.com


UGNetwork Channel _ / _ Computer Talk _ / _ Articles


Support our sponsors: The Design & Publishing Center and UGNNUGNetwork. The content in this channel: http://www.user-groups.com/COMPUTER TALK/ is copyright: Computer Talk Radio, Rich Levin, 1997. All Rights Reserved. * This service provided by The User Group Network, and The Design & Publishing Center to serve the user group community. For information about the UGNetwork, to get involved or have your own groups' home page located at user-groups.com, please contact us. Send an e-mail message to: UGNetwork@user-groups.com copyright 1996 - 1997 User Group Network



488.132.126