Power5, which will be built initially with 130-nanometre (0.13 micron) features, also will feature "simultaneous multithreading," a feature that allows a single chip to act as two. Intel's "hyper-threading" version of this technology adds a modest performance increase -- roughly 20 percent or so, depending on what program the chip is running -- but Arimilli said Power5's multithreading will allow a single processor to behave like two processors running full throttle. Here, IBM's lead isn't as strong. Not only does Intel have some multithreading abilities today in its Xeon server chip line, but it inherited the team designing the now-cancelled EV8 processor that had very sophisticated multithreading. "That's likely to show up in Itanium," Glaskowsky said, and there are some comparatively simple ways Intel can speed multithreading in its Pentium and Xeon lines. Power4 already has two CPUs (central processing units) on each slice of silicon, with four Power4 processors mounted into a large package called a multichip module with thousands of high-speed wires. With Power4, each module has eight CPUs, but the arrival of simultaneous multithreading will increase that to 16, Arimilli said. Power5 won't be much larger than Power4 in terms of transistor count, Arimilli said. Through minimising the increases and circuitry, "We're trying to drive the cost way down," he added. IBM gets the simultaneous multithreading abilities not through new circuitry but through a different use of existing "execution units," the part of the chip responsible for digesting and executing instructions. "We didn't grow more units, we just used the existing units more intelligently," Arimilli said. The new chip also has faster communications channels to the chip so it isn't starved of data as well as better sharing of data in high-speed "cache" memory. IBM plans several other features in the chip as well: * The system will come with added circuitry not only to detect when errors have occurred transmitting data but also to fix those errors, a feature that historically has been reserved to the domain of mainframes. It's part of IBM's eLiza initiative to make servers self-healing. "With Power4, we detected a lot of errors and recovered on a significant amount of them. With Power5, we detect errors and recover from almost every one. We're now maybe 95 or 97 percent of a mainframe" in terms of chip technology, Arimilli said. * Where Power4 was intended for high-end Unix servers, Power5 has a broader mandate, Arimilli said. IBM plans to use it in "blade" servers as well, super-thin servers stacked densely like books in a bookshelf. Glaskowsky said IBM will have to curtail the sizable power consumption and resulting waste heat of Power4 to achieve this target. Power4 produces 125 watts of power, but a blade processor is constrained to about 25 to 40 watts. * "Partitioning," the ability to split a single big server into several smaller ones, will improve. Power4 permits a partition that's the size of a single processor, but Power5 will allow hundreds of partitions, Arimilli said. That hardware move will dovetail with coming versions of AIX -- 5.2 in late 2002 and 5.3 in 2003 -- that increasingly will let hardware resources be easily reassigned to different partitions, Day said. Fast Path
But the Fast Path acceleration feature wins the spotlight, Glaskowsky said. "We've heard nothing from Intel application-specific acceleration features in Itanium, and we can see out into the 2004, 2005 timeframe," Glaskowsky said, adding that Sun can't afford to spend as much on chip design as IBM and Intel and that SGI and Hewlett-Packard Unix chips eventually are being phased out. Arimilli said CPUs tend to spend a large fraction of their time executing a relatively small number of software tasks. It's these tasks the Fast Path acceleration features offload. IBM selected only mature software processes that don't change often so it's not a problem when the operation is hardwired into immutable silicon. "We reached around and tried to find some common things customers do within the operating system that get called frequently," Arimilli said of the feature. The acceleration feature will speed up several communication tasks, including the TCP/IP processing used to read and write data on the Internet and corporate networks. Accelerating TCP/IP makes sense, Glaskowsky said; the software for running a single network connection with a 1 gigabit-per-second transfer capacity soaks up the entire processing power of a processor on an UltraSparc processor in a Sun server, he said. However, other chips can handle the task, and indeed companies such as Alacritech and Adaptec are working on special-purpose chips that do so. Power5 will accelerate other communications processes as well, including the Message Passing Interface (MPI ) used to harness clusters of computers into a collective supercomputer, Arimilli said. And the chip will accelerate virtual memory subsystem, a frequently used operating system feature that manages how higher-speed regular memory can be expanded by using slower but bigger hard drives. Sun cautions that there can be problems accelerating software functions. "If all you're going to do is a custom operation, then building a custom chip makes sense. But computers are still in the general-purpose range," Kunz said. And designing the processor poorly -- for example, hardwiring specific software operations that aren't used frequently -- wastes silicon real estate, making the chip more costly and power-hungry. Glaskowsky said a good chip design could intercept requests to the operating system to handle jobs the chip itself can perform faster, but that automation would be difficult for higher-level software. Arimilli said Power5 and Power6 will be faster for any software, but the acceleration features will require support from software makers. Such support isn't too difficult, Glaskowsky said. Initially, IBM's version of Unix, AIX, will be able to take advantage of the new chip features, Arimilli said. The company also is working with Linux programmers so that Unix variant also can tap into the chip's acceleration resources, he said. "We created these interfaces to the silicon accelerators open so the Linux guys could take advantage of it," Arimilli said.





