AMD came to HardOCP after the lackluster Bulldozer desktop launch and wanted to reach out to the hardware enthusiast community by answering questions that many of us had. We posted a HardForum thread and allowed [H] readers to ask their questions. AMD culled through the questions and AMD staffers answered 10 of the questions.
10 Enthusiast Questions about Desktop Bulldozer
1. Why was the L1 cache size reduced going from Phenom II to Bulldozer?
Mike Butler, Senior Fellow Design Engineer, AMD - Over the years, AMD CPU architects have analyzed and simulated different client and server workload data footprints to efficiently size all levels of cache and have worked to optimize performance. For "Bulldozer" AMD implemented a write-thru L1D cache and focused on improving pre-fetch algorithms and increasing L1D cache bandwidth on popular small block transfers. For those less frequent, larger block transfers we rely upon the efficiencies built into our large 16-way 2MB L2.
Based on the average workloads today, we see this new design – despite the smaller L1 cache – as a more efficient way to process data.
2.
Why are the integer operation benchmarks so low compared to even previous AMD 4 cores?
Mike Butler, Senior Fellow Design Engineer, AMD - "Bulldozer" is a new microarchitecture that differs in several ways from previous generations. The "Bulldozer" architecture uses both dedicated and shared resources, allowing for a more efficient design, improved instructions per watt and maintaining IPC over our widest operating range – from top boost frequencies in unlocked desktops to throughput server workloads at lower voltage, a range unmatched in prior AMD architecture generations.
As such, some individual benchmarks will show different performance levels (some higher, some lower) than prior core designs. The net performance delivered on multi-threaded applications represents a significant advancement in throughput, despite some older benchmarks not benefitting from the unique features and tradeoffs that went into the Bulldozer microarchitecture.
It is also important to note that the "Bulldozer" architecture is configured and optimized for server throughput. The two integer execution cores present in Bulldozer are designed to deliver area- and power-efficient multi-threaded throughput.
3. It seems that the idea of modules and cores sharing parts is brilliant, but the idea of increasing frequency while lowering IPC seems like a step backwards. Why was this decided on?
Mike Butler, Senior Fellow Design Engineer, AMD - Clearly, IPC is an important factor in processor performance, and IPC has decreased slightly in this first instantiation of "Bulldozer." That said there are multiple performance factors – and trade-offs – that went into the design of the forward-looking "Bulldozer" architecture.
The new CPU core delivers higher frequency while maintaining IPC, improved multi-thread (parallel computing) performance, instructions per watt, advanced boost functionality, new x86 instruction sets, and over-clock capabilities never seen in previous microarchitectures. We believe these enhancements will show positive lift for end-users as new operating systems and software applications take advantage of the new features inside "Bulldozer." And, looking forward, as process technology matures over time, the core is well structured for potential increased frequencies in the future.
4. Based on various reviews and benchmarks, the price vs performance of BD seems subpar relative to current Intel SB offerings. Can you explain you competitive positioning relative to Intel?
Adam Kozak, Product Marketing Manager, AMD - AMD designed this part around applications and environments that we believe our customers use – and which we expect them to use going forward. The architecture focuses on high-frequency and resource sharing to achieve optimal throughput and speed in next generation applications and high-resolution gaming.
This is a forward-looking, innovate approach to CPU design. And while that is difficult to measure with older, single threaded benchmarks, we believe the AMD FX CPUs offer a great experience for how our performance customers use their PCs today.
5. Is anything being done from AMD's side to a.) promote the development of more true multithreaded code and b.) give guidance to developers on how to approach multithreaded code to give as good results as possible?
Gabe Gravning, Senior Product Marketing Manager, AMD - We are working with developers and ISVs to encourage the development of multi-threaded applications and code. This is a major focus for AMD and others in the industry. And this was the driving force for us to host our own developer summit in 2011 so we could have a conversation with developers about parallelism and heterogeneous computing. So yes, we are promoting new tools that make it easier for developers to take full advantage of an increasing number of cores on the CPU, GPU or both.
6. It has been stated that Bulldozer will see improvements in performance with the Windows 8 scheduler. Would you elaborate?
Gabe Gravning, Senior Product Marketing Manager, AMD - We worked with Microsoft to improve the way threads are scheduled with the "Bulldozer" architecture in Windows 8®. In Windows 7, workloads are simply executed sequentially across the cores. The Windows 8 scheduler is optimized for the "Bulldozer" architecture and will distribute the workload across each core pair first and then each core resulting in better threaded performance.
For example, in testing by AMD with the AMD FX-8150, we are seeing up to 10% uplift on a number of games with the Windows 8 Developer Preview compared to Windows® 7. Of course, results do vary.
We are also working with Microsoft on a scheduler update for Windows 7 that will be available soon.
7. Why would I buy a $275 Bulldozer cpu when the $170 1090t seems to equal its performance or actually do better at every benchmark and game we've seen?
Adam Kozak, Product Marketing Manager, AMD - We understand our customers make purchase decisions based on how they use their PCs, and in many cases our AMD Phenom™ II processors are a great (purchase).
For those ready for a more modern architecture, who want a desktop for high resolution gaming and to tackle time intensive tasks with newer multi-threaded applications, the AMD FX processor is a great upgrade.
8. What specific or general computing roles do you see BD excelling in? For instance virtualization, Windows 8, solitaire, etc?
Adam Kozak, Product Marketing Manager, AMD - We’re seeing great results at stock frequencies with HD content creation, file processing, image processing, and high resolution gaming environments. Most of these applications are multi-core aware, and some have even begun to use new instructions to further enhance performance on AMD FX processor systems.
9. Why did you make such design decisions as increasing the length of the pipeline in order to achieve higher clocks as opposed to going for efficiency? Were architecture choices that resulted in better IPC offsetting the gains from sharing parts within the CPU?
Mike Butler, Senior Fellow Design Engineer, AMD - The latest architectural advancements from both AMD and our competitors have incorporated advancements from deeper pipelines. The pipeline within our latest "Bulldozer" microarchitecture is approximately 25 percent deeper than that of the previous generation architectures. That deeper pipeline is a key technology advancement, providing record breaking frequencies and performance improvements.
Additionally, the "Bulldozer" design inherently runs at a higher frequency for a given voltage than an alternative design would, and is thus a more power-efficient way of delivering performance – and we expect that performance will scale over time and as process maturity gains are realized.
For example, on parallel DirectX 11 gaming titles like Civilization V and Metro 2033, the AMD FX-8150 outperforms the Core i7-2600 (both with a AMD Radeon™ HD 6970 graphics card) by up to 18 percent and 8 percent, respectively.
Based on design decision like this, the AMD FX-8150 maximum Turbo frequency is 4.2GHz, a 15 percent increase over the AMD Phenom™ II X6 1100T (3.7GHz).
10. We have seen in some benchmarks that this cpu can be a multithreaded beast. I understand that Battlefield 3 and future games will use multithreaded DirectX11 drivers. Do you believe that Zambezi will have any performance advantages over the competition in this area given the extreme multithreaded nature of the chip? And are you working with Dice and or other Developers on similar optimizations for future game and multimedia titles?
Gabe Gravning, Senior Product Marketing Manager, AMD - The question is not "will this transition to multithreaded applications happen" but rather "how soon?" AMD drove this same type of inflection point in the industry with 64-bit computing and APU platforms.
It’s clear that application multithreading, including the multithreading advancements in DirectX 11 are the future, and the "Bulldozer" architecture is the bridge to that future. We see a great opportunity for AMD to lead the industry to a multi-threaded future, just as we have in the past. That will most definitely require tight collaboration with the software developer community.