Intel defends architectural advances
Published: 10 Mar 2006 16:30 GMT
...and show techniques and everything. We're going to dive into a lot of the things all the way down to the microcode, and the way that you do micro fusion and the micro-ops fusion.
Are you making that comparison based on what you understand the competition is working on, or just based on the architecture you developed in Merom?
Let's put it like this: You're trying to assess where the competition will be. If you say that you are going to have an advantage, it's based on an assessment. You might be more accurate or you might be less accurate, but it's the risk we are taking. We believe... we'll be able to open a major gap with the new architecture.
How does the four-wide machine performance compare with the integrated memory controller?
Too many people ask me about the memory controller, and they don't ask me about microfusion or macrofusion, and all these kind of things.
What is memory access? Two things: When you address external memory, you need memory bandwidth and you need memory latency.
Memory bandwidth means in each clock [cycle] you can bring up this amount of data or this amount of data. Memory latency means when I go and I try to fix something for level one or level two cache, and I've got a miss and the data doesn't exist in the CPU, now I need to go to the external memory. And if I don't have enough parallelism, the CPU is idle — it goes to sleep until you fix the external memory.
Now, [an integrated] memory controller gives you one big advantage. The first advantage is, when you want to access memory you can go to the external memory and fetch it. If you've got a north bridge, you need to go to the north bridge, and then from the north bridge go to the memory, and then from the memory go to the north bridge to fetch the data inside. This takes much longer latency.
What I need to do is to make sure that most of the time, if you go and need data, the data resides inside the cache. If you need to access data from the CPU and the data resides in the cache, the bandwidth to access the cache... is much better than any memory controller, because (with a) memory controller, you still need to leave the chip and go externally.
So, if I go and give you a big cache — and I'll do a great prefetch mechanism that will make sure to prefetch the data and prefetch everything from the memory, to make sure that we've got it well in advance inside the cache before you use it — this solution will probably bring us a better solution than any memory controller.
There is an advantage [to an integrated memory controller], and I'm not trying to diminish it — but there's more than one way to skin a calf. I believe that the architecture balance that we'll deliver both in Yonah and Merom probably will give us better performance overall, and you don't need to look at the memory bandwidth. Yes, maybe my memory bandwidths will be little bit less, but it'll be able to have most of the data in the cache overall, from the CPU point of view.
There are critics out there who claim that the Pentium M and Yonah are just really a Pentium III with a few tweaks — a Pentium III with an Israeli accent, let's say. Does it derive a lot from the Pentium III?
Part of it is based on previous architecture and part of it looks forward. It resembled [Pentium III] architecture to some extent, but [there are] a lot of different features inside.
Can you give performance improvement based on the same architecture? Are you taking things from the previous architecture? Yes. Did Yonah not take from Dothan alone? Did Merom take from Yonah? Much less, but to call it Pentium III architecture I believe is doing an injustice to the hundreds of people that delivered Banias.





