středa, 23. listopadu 2011

AMD Bulldozer fiasco continues - EPIC FAIL in servers too!

The desktop Bulldozer benchmarks were a horror show performance for AMD. The newest and greatest architecture often failed to beat its predecessor, let alone the Intel competition. There were no such disasters when looking at server workloads. Much as expected, thread-heavy server workloads fare a lot better, with Interlagos matching or beating Magny-Cours almost across the board (though AnandTech did find a couple of exceptions).

However, the results fall far short of a resounding success for AMD. The results are broadly split between "tied with Opteron 6100" and "33 percent or less faster than Opteron 6100." For a processor with 33 percent more cores, running highly scalable multithreaded workloads, that's a poor show. Best-case, AMD has stood still in terms of per-thread performance. Worst case, the Bulldozer architecture is so much slower than AMD's old design that the new design needs four more threads just to match the old design. AMD compromised single-threaded performance in order to allow Bulldozer to run more threads concurrently, and that trade-off simply hasn't been worth it.

For the workloads such as SAP where the performance has scaled, Opteron 6200 still represents an reasonable upgrade for existing 6100 customers—but it leaves us wondering what might have happened if AMD had simply extended its old architecture. Another four cores in a Magny-Cours processor would show close to the same 33 percent gain, and would do so without compromising single-threaded performance.

The situation up against Intel is even more dire. In AnandTech's benchmarks, the 6200 failed to beat Intel's Xeon processors, in spite of Intel's core and thread deficit. In others, 6200 pulled ahead, with a lead topping out at about 30 percent.

The Xeons used for comparison are Westmere EP-based units; in one form or another, they've been on the market for about 18 months now. They will soon be replaced by the Sandy Bridge-E Xeon E5 2600 series. The Sandy Bridge cores in these processors are faster than those in Westmere EP, and the processors will have two cores and four threads more than Westmere EP. In massively multithreaded workloads, these processors will have 33 percent more cores, and an even bigger performance increase.

Not only will Intel be able to extend its lead in the areas that it already wins; it should be able to leapfrog AMD in those tests where it currently trails. Again, one can't help but feel that a hypothetical 16-core Magny-Cours would have been a better option.

After the poor desktop performance, the possibility still existed that the Bulldozer architecture would start to make sense once we could see the server performance. Now the benchmarks have arrived, AMD's perseverance with Bulldozer is bordering on the incomprehensible. There's just no upside to the decisions AMD has made. All of which raises a question: why did AMD go this route? The company must have known about the weak single-threaded performance and the detrimental effect this would have in real-world applications long before the product actually shipped, so why stick with it? Perhaps AMD's anticipation of high clock speeds caused the company to stick with the design, and there's still a possibility that it might one day attain those clock speeds—but we've seen AMD's arch-competitor, Intel, make a similar gamble with the Pentium 4, and for Intel, it never really paid off.

AMD is boasting that Opteron 6200 is the "first and only" 16-core x86 processor on the market. Not only is this not really true (equating threads and cores is playing fast and loose with the truth), it just doesn't matter. In its effort to add all those "cores," performance has been severely compromised. AMD faces an uphill struggle just to compete with its own old chips—let alone with Intel.