VLIW
Commercial applications of VLIW concept in the U.S. were less successful: Multiflow Computer went down in 1990 and Intel's EPIC/Itanium adventure of late 90s and today proved to be far from successful. The reason for VLIW failure on general purpose computers is the lack of compilers, cross-compilers and automatic code optimization techniques. Intel is still heavily involved in honing EPIC compilers for Itanium (with Babayan's current team and Intel's Israeli's office heavily involved). Yet the state of current technology is such that current VLIW/EPIC compilers are not yet good enough for general purposes and therefore theoretically possible performance gains are almost never achieved (VLIW processors can execute as many as 32 instructions in parallel if a compiler can find and schedule that many). More recent attempt by Transmeta was also unsuccessful and for the same reason, although it's new Efficieon CPU looks more promising than flopped Crusoe. Still, with Itanium disappointment tarnishing commercial VLIW prospects perhaps permanently we are unlikely to see more general-purpose VLIW computers, but instead are likely to seem them in niche markets employed for solving a very limited set of special-purpose tasks.
Quite another alternative to VLIW that is already sprouting profusely is multi-core CPUs. Both Intel and AMD have been shipping dual-core chips for quite some time now with quad-core chips promised in 2007. Sun is already shipping 8-core UltraSparc T1 chips, while Rapport Inc. and IBM have already announced development of Kilocore technology that allows combining as many as 1,024 8-bit processors with a PowerPC core on a single low-cost chip. Thus extrapolating current trends we are likely to see further profusion of multi-core CPUs from all leading manufacturers, especially for server markets. Chances are that as number of on-chip cores grow the cores itself would become more simple and less-deeply pipelined (kind of like UltraSparc T1 is doing already). We are also likely to see some dedicated co-processor-like cores suitable for performing SIMD/multimedia instructions while other cores might be deprived of such capacity in favor of improved energy efficiency and increased overall number of cores.
Perhaps the most noteworthy point is that we are unlikely to see dramatic single-threaded code performance improvements unless a way of frequency increases is found that does not result in the market increase in power consumption (for example, new manufacturing technology in the vein of IBM's recent report of experimental SiGe chips running at 350 GHz at room temperature and at 500 GHz when chilled by liquid helium).
And the truth is that there is no compelling need for further raw CPU speed increases for the following key reasons:
- Computers are already much more powerful than most common tasks require.
- Code efficiency is at all time low and potentially hide at least a order of magnitude performance boost if we just optimize the code.
- Memory and I/O bottlenecks are most common causes of slow-down.
What is amazing is that for a long time we have been using only a handful of CPU models under the aegis of general purpose computing. Furthermore we thought that a better CPU makes a better computer, which is no longer so. What seems to be more important now is overall system design rather than just CPU design, and we are likely to see more system and CPU specialization (and models) targeting different application areas.