Jack Palevich 的个人资料GrammerJack日志 工具 帮助
2007/1/4

Berkely on Many Core CPU design, RAMP FPGA experimental platform

A hot new topic in computer architecture research is "Many core" CPU designs. Recently CPU designers have run into several "brick walls" that prevent them from simply making faster single-core CPUs. As a result, the only clear way to improve system performance is to use multiple cores. We are currently in the "multi core" era, which is loosely defined as 2 to 16 cores on one chip. "Many core" is focused on designs with 32 + cores on one chip. Having many small cores requires a rethink of algorithm design.
 
Here's the main web page of the Berkeley Wiki on this topic:
 
 
One interesting side issue mentioned in these reports is that academic chip designers are now priced out of the chip design business. It simply takes too much money and too much time to design a full-sized chip these days. In an attempt to stay relevent, academics have proposed constructing a standardized FPGA platform, to allow honest simulation of giant chip designs. (You can always simulate your chip design in software, but that's not sexy, and it's also slower, and harder to prove that your simulated design could actually be realized in hardware.)
 
The "RAMP" developers hope their $100,000 kit becomes a standard, used by multiple researchers, similar to how a VAX minicomputer was a standard for computer science research in the 1980s. That way many-core CPU designs could be traded back and forth, and research claims could be independently verified.
 
Another suggestion from the View wiki is that researchers concentrate on 14 "dwarf" kernels, which are small problems that are representative of the kinds of code that has to run fast in order for a parallel computer to be useful. The idea is that the 14 dwarves are easier to analyze and understand than whole applications. (Why yes, there were originally only 7 dwarves, but then they looked at more kinds of programs and came up with 7 additional dwarves.) An example dwarf is the "map reduce" algorithm, much beloved by Google.
 
One cute statistic from the Berkeley wiki is that the first microprocessor, the Intel 4004, had around 2,400 transistors, and one of the first RISC CPUs had 40,000 transistors. You could now fit more than 2,400 of those RISC CPUs on a sincle chip -- more CPUs than the original CPU had transistors! Of course, to be useful, you need to add a MMU, a cache, and inter-core-communication to the 40,000 transistor core. But it's still pretty interesting to think about.
 
Update: Dave Patterson is giving a talk on "The Berkeley View: A New Framework and a New Platform for Parallel Research" at the Stanford University Department of Electrical Engineering Computer Systems Colloquium (EE380)
 
 
The talk will be given on January 31st, 2007, but it will be recorded and available for anyone to view after the talk. (I don't know what the delay period is.)