This article has been updated, see Version 2

Cell Architecture Explained (Version 1) - Part 3: Cellular Computing

The Cell is not a fancy graphics chip, it is intended for general purpose computing. As if to confirm this the graphics hardware in the PlayStation 3 is being provided by Nvidia [Nvidia]. The APUs are not truly general purpose like normal microprocessors but the Cell makes up for this by virtue of including a PU which is a normal PowerPC microprocessor.

Cell Applications

As I said in part 1, the Cell is destined for uses other than just the PlayStation 3. But what sort of applications Cell will be good for?

Cell will not work well for everything, some applications cannot be vectorised at all, for others the system of reading memory blocks could potentially cripple performance. In cases like these I expect the PU will be used but that's not entirely clear as the patent seems to assume the PU can only be used by the OS.

Games

Games are an obvious target, the Cell was designed for a games console so if they don't work well there's something wrong! The Cell designers have concentrated on raw computing power and not on graphics, as such we will see hardware functions moved into software and much more flexibility being available to developers. Will the PS3 be the first console to get real-time ray traced games?

3D Graphics

Again this is a field the Cell was largely designed for so expect it to do well here, Graphics is an "embarrassingly parallel", vectorisable and streamable problem so all the APUs will be in full use, the more Cells you use the faster the graphics will be. There is a lot of research into different advanced graphics techniques these days and I expect Cells will be used heavily for these and enable these techniques to make their way into the mainstream. If you think graphics are good already you're in for something of a surprise.

Video

Image manipulations can be vectorised and this can be shown to great effect in Photoshop. Video processing can similarly be accelerated and Apple will be using the capabilities of existing GPUs (Graphics Processor Units) to accelerate video processing in "core image", Cell will almost certainly be able to accelerate anything GPUs can handle.

Video encoding and decoding can also be vectorised so expect format conversions and mastering operations to benefit greatly from a Cell. I expect Cells will turn up in a lot of professional video hardware.

Audio

Audio is one of those areas where you can never have enough power. Today's electronic musicians have multiple virtual synthesisers each of which has multiple voices. Then there's traditionally synthesised, sampled and real instruments. All of these need to be handled and have their own processing needs, that's before you put different effects on each channel. Then you may want global effects and compression per channel and final mixing. Many of these processes can be vectorised. Cell will be an absolute dream for musicians and yet another headache for synthesiser manufacturers who have already seen PCs encroaching on their territory.

DSP (Digital Signal Processing)

The primary algorithm used in DSP is the FFT (Fast Fourier transform) which breaks a signal up into individual frequencies for further processing. The FFT is a highly vectorisable algorithm and is used so much that many vector units and microprocessors contains instructions especially for accelerating this algorithm.

There are thousands of different DSP applications and most of them can be streamed so Cell can be used for many of these applications. Once prices have dropped and power consumption has come down expect the Cell to be used in all manner for different consumer and industrial devices.

SETI

A perfect example of a DSP application, again based on FFTs, a Cell will boost my SETI@home [SETI] score no end! As mentioned elsewhere I estimate a set of 4 Cells will complete a unit in under 5 minutes [Calc]. Numerous other distributed applications will also benefit from the Cell.

Scientific

For conventional (non vectorisable) applications this system will be at least as fast as 4 PowerPC 970s with a fast memory interface. For vectorisable algorithms performance will go onto another planet. A potential problem however will be the relatively limited memory capability (this may be PlayStation 3 only, the Cell may be able to address larger memories). It is possible that even a memory limited Cell could be used perfectly well by streaming data into and out of the I/O unit.

GPUs are already used for scientific computation and Cell will be likely be useable in the same areas: "Many kinds of computations can be accelerated on GPUs including sparse linear system solvers, physical simulation, linear algebra operations, partial difference equations, fast Fourier transform, level-set computation, computational geometry problems, and also non-traditional graphics, such as volume rendering, ray-tracing, and flow visualization."[GPU]

Super Computing

Many modern supercomputers use clusters of commodity PCs because they are cheap and powerful. You currently need in the region of 250 PCs to even get onto the top 500 supercomputer list [Top500]. It should take just 8 Cells to get onto the list and 560 to take the lead*. This is one area where backwards compatibility is completely unimportant and will be one of the first areas to fall, expect Cell based machines to rapidly take over the Top 500 list from PC based clusters.

There are other super computing applications which require large amounts of interprocess communication and do not run well in clusters. The Top500 list does not measure these separately but this is an area where big iron systems do well and Cray rules, PC clusters don't even get a look-in. The Cells have high speed communication links and this makes them ideal for such systems although additional engineering will be required for large numbers of Cells. Cells may not only take over from PC clusters but also expect them to do well here also.

If the Cell has a 64 bit Multiply-add instruction (I'd be very surprised if this wasn't present) it'll take 8000 of them to get a PetaFlop**. That record will be very difficult to beat.

** Based on theoretical values, in reality you'd need more Cells depending on the efficiency.

Servers

This is one area which does not strike me as being terribly vectorisable, indeed XML and similar processing are unlikely to be helped by the APUs at all though the memory architecture may help (which is unusual given how amazingly inefficient XML is). However servers generally do a lot of work in their database backend.

Commercial databases with real life data sets have been studied and found to have been benefited from running on GPUs. You can also expect these to be accelerated by Cells. So yes, even servers can benefit from Cells.

Stream Processing Applications

A big difference from normal CPUs is the ability of the APUs in a cell to be chained together to act as a stream processor [Stream]. A stream processor takes a flow of data and processes it in a series of steps. Each of these steps can be performed by a different APU or even different APUs on different Cells.

An Example: A Digital TV Receiver
To give an example of stream processing take a Set Top Box for watching Digital TV, this is a lot more complex process than just playing a MPEG movie as a whole host of additional processes are involved. This is what needs to be done before you can watch the latest episode of Star Trek, here's an outline of the processes involved:

These tasks are typically performed using a combination of custom hardware and dedicated DSPs. They can be done in software but it'll take a very powerful CPU if not several of them to do all the processing - and that's just for standard definition MPEG2. HDTV with H.264 will require considerably more processing power. General purpose CPUs tend not to be very efficient so it is generally easier and cheaper to use custom chips, although highly expensive to develop they are cheap when produced in high volumes and consume miniscule amounts of power.

These tasks are vectorisable and working in a sequence are of course streamable. A Cell processor could be set-up to perform these operations in a sequence with one or more APUs working on each step, this means there is no need for custom chip development and new standards can be supported in software. The power of a Cell is such that it is likely that a single Cell will be capable of doing all the processing necessary, even for High definition standards. Toshiba intend on using the Cell for HDTVs.

Cell Stream processing diagram

Non Accelerated Applications

There are going to be many applications which cannot be accelerated by a Cell processor and even those which can may not be ported overnight. I don't for instance expect Cell will even attempt to go after the server market.

But generally PCs either don't need much power or they can be accelerated by the Cell, Intel and AMD will be churning out ever more multi-core'd x86s but what's going to happen if Cells will deliver vastly more power at what will rapidly become a lower price?

The PC is about to have the biggest fight it has ever had. To date it has won with ease every time, this time it will not be so easy. In Part 4 I look at this forthcoming battle royale

 


Introduction and Index
Part 1: Inside The Cell
Part 2: Again Inside The Cell
Part 3: Cellular Computing
Part 4: Cell Vs the PC
Part 5: Conclusion and References
Part 6: Updates, Clarifications and Missing Bits

 

© Nicholas Blachford 2005.