Cell Architecture Explained Version 2
Part 5: Conclusion and References
Short Overview
The Cell architecture consists of a number of elements:
The Cell Processor
This is a 9 core processor, one of these cores is a PowerPC and acts as a controller. The remaining 8 cores are called SPEs and these are very high performance vector processors. Each SPE contains it's own block of high speed RAM and is capable of 32 GigaFlops (32 bit). The SPEs are independent processors and can act alone or can be set up to process a stream of data with different SPEs working on different stages. This ability to act as a "stream processor" gives access to the full processing power of a Cell which is claimed to be more than 10 times higher than even the fastest desktop processors.
In addition to the raw processing power the Cell includes a high performance multi-channel memory subsystem and a number of high speed interconnects for connecting to other Cells or I/O devices.
Distributing Processing
A software infrastructure under development will allow Cells to work together. While they can be directly connected via the high speed interconnects they can also be connected in other ways or distributed over a network. The Cells are not gaming or computer specific, they can be in anything from PDAs to TVs but they can still be used to act as a single system.
Parallel programming is usually complex but in this case the OS will look at the resources it has and distribute tasks accordingly, this process does not involve any more programming than the initial parallelisation. If you want more processing power you simply add more Cells, you do not need to replace the existing ones as the new Cells will augment the existing ones.
Overall the Cell architecture is an architecture for distributed, parallel processing using very powerful computational engines developed using a highly aggressive design strategy. These devices shall be produced in vast numbers so they will provide vast processing resources at a low cost.
Conclusion
The rule in the PC world has always been “evolution not revolution”, often simply incorporating features from other platforms. The changes are incremental and produce incremental performance boosts.
Cell is a revolution, a completely new microprocessor architecture which, while it may take some time to get used to, promises a vast performance boost over today’s systems. GPUs can already run 10 times faster than desktop CPUs, Cell will not only bring similar performance but will do so for more applications and it’ll be easier to program.
Being produced in large volumes also means the Cell will be cheap. They will likely see wide spread not just in living rooms but but in the realm of industry and science as well. The embedded world is much, much larger than the PC world and often imposes stringent constraints on the components used, the same sort of constraints the Cell has been designed for.
Some have suggested that STI should have gone for a more conventional design such as three PowerPC 970s on a single chip. Such a design would not have addressed the power issues and would, as a result have to of been driven at a relatively low clock rate. Instead, by using simpler designs which use vectors the Cell designers have managed to fit 9 cores on a single chip at a higher clock speed, the potential performance is consequently considerably higher.
The Cell is a new architecture and will seem strange and alien to many used to rather more conventional desktop designs. In order to utilise it properly programers will have face new problems and devise new ways of solving them. It remains to be seen how much of the Cell’s potential can be achieved and how difficult it is to extract it, but it’s clear that STI are trying to make this as painless as possible.
Many people do not like change, to them Cell represents a threat. For others it represents an opportunity.
Lets see how many take the opportunity, and what other opportunities the other CPU vendors come up with in response.
Acknowledgements
Many thanks the people who took the time to review this article and the people who put me in touch with them.
I’d also like to thank the people who responded to the previous version, I wasn’t able to respond to many of the e-mails but I did read them all.
Further Reading
There are numerous other articles and papers around the web which cover the Cell:
Ars Technica
http://arstechnica.com/articles/paedia/cpu/cell-1.ars
http://arstechnica.com/articles/paedia/cpu/cell-2.ars
Microprocessor Report
http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/D9439D04EA9B080B87256FC00075CC2D
Real World Tech
http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318
http://www.realworldtech.com/page.cfm?ArticleID=RWT022805234129
Anandtech
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2379&p=1
IBM Research Paper
http://www.research.ibm.com/cell/
Various other IBM papers
http://www-306.ibm.com/chips/techlib/techlib.nsf/products/Cell
References
[Patent]
The original Cell patent application by Masakazu Suzuoki and Takeshi Yamazaki of Sony Computer Entertainment inc.
[Kutaragi] Interview with Sony's Ken Kutaragi
http://www.eetasia.com/ARTICLES/2005JUN/C/2005JUN_INT_WK2.HTM
http://techon.nikkeibp.co.jp/english/NEWS_EN/20050407/103542/
[3rd party]
The 3 STI companies can sell cell to their own customers.
http://www.toshiba.co.jp/about/press/2002_04/pr0201.htm
[GPU10]
GPUs have be measured at 10 ties faster than conventional CPUs.
http://www.gpgpu.org/vis2004/A.lefohn.intro.pdf
[GFLOPS]
256 = 8 (SPEs) x 4GHz x 4 (32 bit words in a vector) x 2 (multiply-adds are counted as 2 operations).
The Top500 supercomputer list counts double precision GFLOPS so these are not comparable.
[Amdahl's law]
http://en.wikipedia.org/wiki/Amdahl's_law
[Hyper]
IBM’s Hypervisor used to validate the Cell.
http://www.research.ibm.com/hypervisor/
[The400]
IBM's OS/400 is said to be capable of running on a Cell.
http://www.itjungle.com/tfh/tfh030705-story02.html
[Vector]
What is vector processing?
http://www.nus.edu.sg/Major/SVU/techinfo/vector_processing.html
[GCC]
Vectorisation in GCC
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
[Xbox360]
http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars
http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars
[Stream]
ACM Queue on Streaming Processors
http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=128
[GPU]
Interesting paper on GPUs and clustering them to produce very high performance systems
http://www.cs.sunysb.edu/~vislab/projects/urbansecurity/GPUcluster_SC2004.pdf (Pdf)
[GPGPU]
GPUs can be used for general purpose computations. Most if not all of these applications will also be work on a Cell.
[Rambus]
Sony and Toshiba licensed Rambus technology for use in the Cell.
http://www.hardwareanalysis.com/content/article/1576/
[FlexPhase]
http://www.rambus.com/products/xdr/innovations/flexphase.aspx
[Xdimm]
http://www.rambus.com/products/xdr/xdimm.aspx
[Opti]
Optical interconnects in silicon
http://www.theinquirer.net/?article=22153
[AltiVec/VMX] Also known as Velocity Engine or VMX.
http://developer.apple.com/hardware/ve/
[Jobs]
Cell Development Slides.
http://research.scea.com/research/html/CellGDC05/index.html
[BPA] Experimental patch set to intro Cell to Linux kernel, it’s referred to here as the “BPA” (Broadband processor Architecture).
http://www.ussg.iu.edu/hypermail/linux/kernel/0504.3/1150.html
[Dev] Supports for development tools is being added.
http://www.gccsummit.org/2005/2005-GCC-Summit-Proceedings.pdf
[CellLinux]
Linux on Cell uses a virtual file system for SPU communications.
http://www-128.ibm.com/developerworks/power/library/pa-cell/
http://www-128.ibm.com/developerworks/power/library/pa-expert4/
[CellDev]
Presentation on programming and an example of large (and fast!) FFTs on Cell (791KB).
www.power.org/news/events/barcelona/11_chow.pdf
[CI]
(see about the author below).
[Comment]
The removed article generated a lot of controversy, this covers the relevant issues.
http://arstechnica.com/news.ars/post/20050629-5054.html
[AC]
Alan Cox
http://en.wikipedia.org/wiki/Alan_Cox
Some good comments on writing good software:
http://www.pingwales.co.uk/software/cox-on-better-software-2.html
[Emma]
Lengthy paper on processor performance limits
http://www.research.ibm.com/journal/rd/413/emma.html
[guTS]
1GHz PowerPC in 1997
http://www.research.ibm.com/arl/projects/guTS.html
http://csdl.computer.org/comp/mags/mi/1998/03/m3066abs.htm
[Ultra]
Ultra high frequency microarchitecture at IBM Research
http://www.research.ibm.com/lowfo4/
[LowIPC]
Evaluating CPU performance (IPC graphs)
www.princeton.edu/~mrm/workshop/oskin.pdf
[x86vsPPC]
The low number of registers in x86 CPUs necessitates more aggressive OOO hardware.
Removing it will thus have a larger performance impact than on PowerPC based CPUs.
http://www.osnews.com/story.php?news_id=3997
[HPCA]
The Cell was designed with low power consumption in mind.
http://www.hpcaconf.org/hpca11/papers/25_hofstee-cellprocessor_final.pdf .
Miscellaneous Notes:
Fanboi hyp3.
Contrary to some opinions the first version was not an attempt to hype the Cell processor.
The original version of this was written after I read a debate on the Cell in late 2004. The Cell was clearly not widely understood at the time and I felt I could provide an explanation.
I am a technology enthusiast and when I see some very advanced technology heading in our direction I’m interested and I’m excited. Some of this emotion evidently made it into the first version of the article.
Neither this or the previous version was requested or paid for by Sony, Toshiba, IBM or anyone else.
About the Author
Nicholas Blachford lives in Paris.
He is currently (slowly) learning French and designing / writing a softsynth for OS X.
He is “talking to” Cell-Industries.
You might find him hanging about at whyzzat .
Part 5: Conclusion and References
© Nicholas Blachford 2005.