In general, there are two basic approaches to building multi-core processors. Symmetric multiprocessing chips, such as IBM's Power 4 and presumably chips with core hopping, essentially squeeze two equal processors into a single piece of silicon, so that the chip provides the same computing power as a dual processor server. The approach saves on computing real estate and can increase efficiency because the chip cores can share cache memory or buses. In asymmetric multiprocessing, the two internal chip cores differ from each other and perform specific functions, offloading work from the central processor. Additionally, you could get "little co-processors that do various tasks now handled by software," said Krewell, jobs such as TCP/IP processing or encryption. Similarly, chip designers could build high-intensity regions into the chip. Intense, high-priority number crunching calculations could be directed toward certain transistors being supplied with greater amounts of power, Pinfold said. Less significant tasks, meanwhile, could be shunted off to other regions. The ultimate design and techniques used will depend on the whether the chip will go into mobile devices, servers or desktops. Research is being conducted in the company's labs in the United States, but also in Israel and Barcelona, Spain, owing to the diverse nature of the work. "The microprocessor of the future will be much more appropriate to its use," Pinfold said. "We will go to where we can find the best architects." The processor changes will take place in tandem with increased thread-level parallelism. Under thread-level parallelism, software instructions get separated into individual streams. Once broken down, the streams of an application can be processed in parallel, rather than sequentially, thereby saving time. Cache misses -- when a processor doesn't have the required data in its nearby cache of memory -- can hinder computer performance because the processor has to spend cycles digging the data out of main memory. Helper threads could anticipate potential cache misses and retrieve the data before the required calculation, Pinfold said. Current benchmarks show substantial performance benefits through application threading. Hyper-threading, which takes advantage of threaded applications, was introduced to Intel's Xeon line earlier this year and will soon come to the Pentium line, sources have said. The increased emphasis on application threading comes at a crucial time. For the past decade or so, designers have squeezed performance out of instruction level parallelism, which involves juggling the processor's instructions for greater efficiency. But the ceiling is in sight. "We've pretty much mined that vein as far as performance is concerned," Pinfold said. Although all of the ideas show promise, it's difficult to predict how they will be embodied in commercial products. Some of these chips will need extraordinary amounts of cache to work properly, which will force designers to balance the performance-power equation, said Nathan Brookwood, principal analyst at Insight 64. Some processors, such as the Itanium chip, already take advantage of application threading, so they don't need to adapt multi-core ideas right away. Still, the mathematics make multi-core chips inevitable in many markets. "Multi-core is a very efficient way to use up transistors and increase performance," Brookwood said.





