Lets Get Physical: Inside The PhysX Physics Processor

Part 3 - The Alternative And Others

The Other PPU

Part 2 contains a description of the second design of the PPU that is described in the patent. The first design used completely different DME and FPEs.

As we’ve seen the second version divides the DME into a number of memory control units and the FPE into a number of vector processing units. The first version did not do any such division in the DME and division in the FPE was limited.

The version 1 DME comprised of a set of address generators, busses, memories and crossbar switches. It looks nothing even remotely like the memory control units of variant 2. With 6 address generators running simultaneously to control the movement it looks like it would be truly mind bogglingly complex to program and probably be less flexible.

The variant 1 FPE was loosely divided into a number of vector processors but used a complex series of register banks some of which were shared between the processors and others not. The actual processing was the same hybrid vector-VLIW method described for variant 2 but it uses a more traditional 4 element vector and included the ability to do both integer and floating point scalar operations alongside the vector computations.

The “preferred” second variant is conceptually a lot simpler than the first version described in the patent and most likely easier to program. Crucially however, the second variant will be easier to manufacture than the first. There are always errors on chips causing parts to fail, if the design is divided up faulty parts can be deactivated and the chips sold with less functionality at a lower price, a practice common in the semiconductor industry.

The first version was divided into sub-parts but not to the same degree as the second. A single fault in the wrong area could conceivably make the chip useless lowering the number which could be sold.

It’s quite conceivable the final product will be different again from both versions described in the patent but I doubt it will change at the highest design level. Lower level changes are very likely however as design simulations reveal the system’s behaviour and changes made accordingly. Changes may also be made after prototypes are made but these are likely to be relatively minor.

Differences Between The PPU And Desktop Processors.

The PPU is really quite different to pretty much any conventional processor and is more akin to a GPU (Graphics Processor Unit). This isn’t surprising given it’s dedicated to physics processing, not general purpose processing.

The programming model for the PPU is also likely to be wildly different from a conventional processor. In a conventional processor you do everything in a single program and a single memory. You move data into on-chip registers then do operations on the data, you then write the data back out to memory. You do need to understand program flow when writing the program but you do not need to explicitly control loading and executing the instructions as this is handled by the processor.

The PPU breaks these operations into parts so there is one program for loading data and another for performing operations on it. A third program running on the PPU Control Engine is responsible for uploading these programs into their respective memories and instructing the processors which address to start executing from.

While the PPU may sound highly complex to program, this is a problem that will most likely only be faced by Ageia’s driver writers. Normal developers will program the chip via the PhysX / Novodex API with custom routines most likely be done in a customised language in a similar manner to the way custom shaders are used on GPUs. The complexity will most likely be for the most part completely hidden - as is the case with GPUs today.

..And The Cell

The PPU is quite different from conventional microprocessors but is similar in some respects to the PS3’s Cell processor, this is not surprising as they are both multicore vector processors. The are different however in that the Cell was designed as a more general purpose(ish) processor whereas the PPU has been designed to to accelerate a specific type of computation.

There are a number of similarities:

No cache - the memory arrangement is similar to the “Local stores” in the Cell’s SPEs.

Processing routines cannot directly access memory.

No out-of-order processing hardware.

Designed to operate at as close to its theoretical maximum performance as possible.

Very high memory bandwidth, hardware designed to make maximum use of it.

There are on the other hand also some considerable differences between the Cell and the PPU:

The Cell contains a fully fledged Power processor which shall run an OS and applications. The PCE is likely to run a very rudimentary OS and as such will likely be a low end device such as an ARM core.

Being designed for Physics only the Instruction set looks likely to be limited.

Processing operations are all 32 bit floating point with 32 bit integer processing seemingly present only for control purposes.

No virtualisation - All the cores on the Cell contain memory management units, at most the the PPU might have one in the PCE but even that’s not a given.

The Competition

At the moment no other company has announced plans for a dedicated physics accelerator but that’s not to say there’s no competition.

ATI in particular have been talking about using their GPUs for physics but quite how they’ll manage to do that if the card is already busy with graphics is unknown. GPUs aren’t designed for physics in particular but they are becoming increasingly programmable these days [D3D10] and they certainly have the floating point capabilities. Perhaps we’ll see modified or even re-badged graphics cards sold as physics accelerators.

If other cards do appear on the market they will have the problem of which physics API to support, there is no single standard for physics programming and Ageia are unlikely to support direct competitors. Each card may end up having its own API. There is hope however in that a widely supported standards group recently released COLLADA v1.4 which includes a Physics API [COLLADA] . COLLADA could end up as the standard API for physics unless that is, Microsoft decides to invent one. Interestingly Ageia’s API is mentioned on the COLLADA site so there appear to be at least some interoperability with the standard.

Consoles are not in direct competition with Ageia but are competing with gaming PCs.

The PS3’s Cell processor and the XBox360’s triple core PowerPC were both designed for very high floating point computational capabilities, in both cases this will give the consoles enhanced physics capabilities. As mentioned above Cell in particular has a similar architecture to the PhysX chip so should do well on game physics. Indeed Sony seem to be pushing the idea of the PS3 using Cell to do “natural motion” through physics [Natural].

Ageia have been pretty smart in that they are supporting both these consoles through their APIs. Any game being ported from consoles to PCs can thus relatively easily take advantage of the PhysX chip. The same is also true in the other direction.

Non-Gaming Uses

While the PPU is designed to physics calculations it’s quite possible that other uses will be found for it. GPUs are designed for 3D graphics but have found to be useful for many other types of processing. This is an increasingly popular usage as they can perform many times faster than desktop processors in many instances.

One potential use would be in “physical modelling” synthesisers, these use physics processing to simulate the individual parts of musical instruments and can create strikingly realistic sounds. Physics is also used in engineering and of course scientists could also potentially find the processor useful.

Conclusion

Consoles will have the raw floating point power necessary for physical simulation, the PhysX will bring that capability to gaming PCs. It is however another card to add so the price of a gaming rig will move yet further away from games consoles.

Gaming is becoming ever realistic looking and in the future these looks shall be accompanied by increasingly realistic behaviour as well. While you can expect many to be initially sceptical of the benefit of physics I expect this will change as more and more games make use of the technology and the difference becomes obvious.

Quite if it becomes a standard piece of equipment for all PCs is another matter. With interfaces beginning to use 3D everyone can now use the 3D hardware, it’s not clear if there is a common use which would allow everyone to make use of a physics accelerator.

With their massive computing capabilities consoles were looking like they’d get quite ahead of gaming PCs. Quad SLI set-ups may produce amazing graphics but without the raw physics processing power the games would paradoxically, look unrealistic. The PhysX will bring these capabilities to PCs and as such will very likely become a standard part of gaming PCs.

References:

[PhysX]

http://www.ageia.com

[Patent]

http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050075849%22.PGNR.&OS=DN/20050075849&RS=DN/20050075849

[Car] What Game Designers Need to Know About Physics: (Requires log-in)

http://www.gamasutra.com/resource_guide/20030121/marcus_01.shtml

[D3D10]

http://www.gamedev.net/reference/programming/features/d3d10overview/

[COLLADA]

http://www.khronos.org/collada/

[Natural]

http://www.eetimes.com/news/latest/showArticle.jhtml;jsessionid=A4LONMC0WANHCQSNDBESKHA?articleID=179100739