Redirected from Intel Itanium
At the most basic level the Itanium design is similar to RISC. That is, the core logic consists of a small set of instructions that are designed to be able to run very fast. Like most modern CPUs the Itanium uses several cores run in parallel for extra speed, a design known as a superscalar processor. Where the Itanium breaks with current RISC design philosophy is in how it feeds instructions into those core units.
In a traditional design a complex decoder system examines each instruction as they flow through the pipeline, and sees which can be fed off to operate in parallel across the cores. For instance a series of instructions that says A = B + C
and D = F + G
will not affect each other, and so they can be fed into two cores to be run at the same time.
Predicting which code can and cannot be split up this way is in fact a very complex task. In many cases the inputs to one line are dependent on the output from another, but only if some other condition is true. For instance, consider the slight modification of the example noted before, A = B + C; IF A==5 THEN D = F + G
. In this case the calculations remain independent of the other, but the second command requires the results from the first calculation in order to know if it should be run at all.
In these cases the circuitry on the CPU typically "guesses" what the condition will be. In something like 90% of all cases, an IF will be taken, suggesting that in our example the second half of the command can be safely fed into another core. However, getting the guess wrong can cause a significant performance hit when the result has to be thrown out and the CPU waits for the results of the "right" command to be calculated. Much of the improving performance of modern CPUs is due to better prediction logic, but lately the improvements have begun to slow.
Itanium instead relies on the compiler for this task. Even before the program is fed into the CPU, the compiler examines the code and makes the same sorts of decisions that would otherwise happen at "run time" on the chip itself. Once it has decided what paths to take, it gathers up the instructions it knows can be run in parallel, bundles them into one larger instruction, and then stores it in that form in the program—hence the name VLIW or "very long instruction word".
Moving this task from the CPU to the compiler has several advantages. Firstly the compiler can spend considerably more time examining the code, a benefit the chip itself doesn't have because it has to complete as quickly as possible. Thus the compiler version can be considerably more accurate than the same code run on the chip's circuitry. Secondly the prediction circuitry is quite complex, and this system reduces that complexity enormously. It no longer has to examine anything, it simply breaks the instruction apart again and feeds the pieces off to the cores.
The downside in this case is that a running program's behaviour is not always obvious in the code used to generate it. That means that it is possible for the compiler to "get it wrong", perhaps (in theory) even more often than the same logic placed on the CPU. Thus the design relies heavily on the performance of the compilers, the trade-off being to decrease microprocessor hardware complexity by increasing compiler software complexity.
Design of the Itanium series started in 1994, based on pioneering research by Hewlett-Packard into VLIW designs. The original HP design was "clean", but that is to be expected from a design that was never to be used in a production setting. After Intel became involved the cleanliness of the original design was marred by the addition of several new capabilities needed for "real work" use, notably the ability to run IA-32 instructions, and HP added their own features to ease migration from the HP-PA.
The project to produce a production quality Itanium is still ongoing. Originally planned for release in 1997, the schedule has slipped several times. In 2001 the first version, code named Merced shipped, but performance was disappointing. In IA-64 mode, it performed only slightly better than an equivalently clocked X86 design, and when running X86 code, performance was absolutely awful, about 1/8th that of an similarly clocked X86 processor. Soon even Intel suggested it wasn't a "real" release.
The main (though by no means only) problem with the Itanium was that the latency of it's third-level cache was staggeringly high, making it virtually useless. The only option was to use an on-die solution. Intel did this, while lowering all the cache latancies to the lowest of any modern design (apart from IBM's Power4). They also changed the Itanium's 64-bit 266MHz bus into a 128-bit 400MHz bus, tripling memory bandwidth.
The second generation Itanium chips were launched in July 2002. In IA-64 mode, Integer performance was the best out of any design at the time of launch, while Floating-point code was second only to Power4. Unfortunately, X86 performance didn't improve much, and performed about as well as a Pentium II.
With backing from both Intel and HP, a number of other CPU lines have been end-of-lifed. The Compaq / DEC Alpha, the HP PA-RISC family, and the SGI MIPS UNIX lines will eventually be retired in favor of Itanium hardware. With the exception of SGI's IRIX, the OS's running on these machines will remain similar. However the ever-slipping release schedule has forced several of these projects to be revved in the meantime.
Software support for the Itanium is a work in progress, but Linux is a shipping platform, and work on NetBSD will begin when Itanium-based hardware ships. Proprietary operating systems being ported include include Microsoft Windows, HP-UX, Tru64, OpenVMS[?], and AIX. It remains to be seen how they overcome the limitations of microarchitecture-specific scheduling.
In 2002, the Itanium is the second most expensive computing project in history, behind only the IBM 360 (which, it's important to note, was a huge success). Nevertheless there are serious doubts about the future of the product, centering mainly on two problems.
The first is that the benefits in simplicity, one of the main goals of the VLIW design, are not at all evident in the Itanium. The 2nd generation Itanium has a massive 221 million transistors drawing an equally massive 130 watts of power. For this same sort of budget the IBM POWER delivers four whole 64-bit CPUs on a single processor module.
Another, perhaps more serious, issue is that while dynamic scheduling in hardware has been done many times, designing an Itanium-friendly compiler is a new art. It is not at all clear at this time whether or not the compilers will be able to live up to the original goals, notably with all of the added complexity the Itanium gained along the way.
Critics of the Itanium processor have labeled it the "Itanic". Intel will be in a difficult position if the Itanium processor is a disappointment, as the need for 64-bit architecture in commodity servers is now pressing, and the need for a 64-bit architecture in personal computers is only a few years away.
A real architectural threat to Intel now exists in the form of AMD's x86-64 architecture. AMD's x86-64 follows Intel's earlier behavior of extending a single architecture, first from the 8-bit 8088 to the 16-bit 8086, then from 16-bits to the 32-bit 80386 and beyond, without ever removing backwards compatibility. The x86-64 architecture extends the 32-bit x86 architecture by adding 64-bit registers, with a full 32-bit and 16-bit compatibility modes for earlier software. There are now pre-release versions of both Linux and the Microsoft Windows operating systems available for x86-64, together with early test silicon. Production system are expected in 2003.
The failure of Itanium would also have a substantial impact on manufacturers such as HP who have announced that they will abandon their proprietary CPU architectures (such as the PA-RISC architecture CPU in the case of HP) for the Itanium.
Links:
Search Encyclopedia
|
Featured Article
|