Since the early 1980s, CPUs have added several additional computing units (ALUs) and can process several instructions in parallel. However, only some instructions can actually run in parallel. It's up to the CPU to decide which ones. Guessing wrong can slow the processor considerably.
For instance, the CPU might have the ability to multiply two numbers at the same time. However, the results of the second may depend on the first. If so, the second of the two units "stalls" while it waits for the first one to finish.
A similar problem occurs when the result of such an instruction is used as input for a branch. Most CPUs "guess" which branch will be taken even before the calculation is complete, so that they can load up the instructions for the branch, or (in some architectures) even start to compute them[?] speculatively. If the CPU guesses wrong, all of these instructions and their context need to be "flushed" and the correct ones loaded, which is time-consuming.
This has led to increasingly complex decoders that attempt to guess right, and the simplicity of the original RISC designs has been eroded.
In the 1990s, Hewlett-Packard researched this problem as a side effect of ongoing work on their PA-RISC processor family. They found that the CPU could be greatly simplified by removing the complex decoding logic from the CPU and placing it into the compiler. Today's compilers are much more complex than those from the 1980s, so this added complexity in the compiler is considered to be a small cost.
VLIW CPUs are actually RISC-based, typically with four main units. After compiling the program normally, the VLIW compiler re-orders the code into paths that simply don't have any dependencies. These are then sliced into four (one for each unit of the CPU) and packaged together into one larger instruction with additional information regarding which of the instructions should run which unit. The result is a single much larger op-code (thus the term "very long").
The Itanium IA-64 processor manufactured by Intel is an example of a VLIW CPU.
Another problem with VLIW processors is that they do not scale well to different price points. Both CISC or RISC machines can be implemented in many ways to save varying amounts of money (indeed most CISC processors are now implemented as RISC processors with a hardware instruction-set-translation front end). A VLIW machine has fewer options.
To cope with this problem, Transmeta added a binary-to-binary runtime compiler to the CPU. Basically, this compiler reads the software (in x86 op codes), and compiles it into the CPU's internal machine code. Thus, the Transmeta chip is internally a VLIW processor, but externally appears to be a CISC processor.
Search Encyclopedia
|
Featured Article
|