Compilation on a Cell blade

danieltracy
Posts: 13
Joined: Mon Oct 13, 2008 11:01 pm

Compilation on a Cell blade

Post by danieltracy »

I'm looking to experiment with the Bullet SDK as a physics server on a Cell blade. I have the source to Bullet 2.72 and tried using the compilation instructions provided in BulletSpuOptimized.pdf therein. I get stuck at step 2:

2) Change directory to $(BULLET_ROOT)/Extras/BulletMultiThreaded
run make all
This should create a PPU library (bulletmultithreaded.a) placed in $(BULLET_ROOT)/lib/ibmsdk.
It also creates a SPU executable (spuCollision.elf) placed in
$(BULLET_ROOT)/Extras/BulletMultiThreaded/out

Attempting to link produces many "undefined reference" errors that should resolve within Bullet:

ccache /usr/bin/spu-gcc -DUSE_LIBSPE2 -D__SPU__ -DNDEBUG -W -Wall -Winline -O3 -mbranch-hints -fomit-frame-pointer -ftree-vectorize -finline-functions -ftree-vect-loop-version -ftree-loop-optimize -ffast-math -fno-rtti -fno-exceptions -c -include spu_intrinsics.h -include stdbool.h -I. -I/opt/cell/sysroot/usr/spu/include -I../../src -I./SpuNarrowPhaseCollisionTask -o ./out/btStridingMeshInterface.o ../../src/BulletCollision/CollisionShapes/btStridingMeshInterface.cpp
../../src/BulletCollision/CollisionShapes/btStridingMeshInterface.h:80: warning: unused parameter 'aabbMin'
../../src/BulletCollision/CollisionShapes/btStridingMeshInterface.h:80: warning: unused parameter 'aabbMax'
../../src/BulletCollision/CollisionShapes/btStridingMeshInterface.h:81: warning: unused parameter 'aabbMin'
../../src/BulletCollision/CollisionShapes/btStridingMeshInterface.h:81: warning: unused parameter 'aabbMax'
ccache /usr/bin/spu-gcc -DUSE_LIBSPE2 -D__SPU__ -DNDEBUG -W -Wall -Winline -O3 -mbranch-hints -fomit-frame-pointer -ftree-vectorize -finline-functions -ftree-vect-loop-version -ftree-loop-optimize -ffast-math -fno-rtti -fno-exceptions -c -include spu_intrinsics.h -include stdbool.h -I. -I/opt/cell/sysroot/usr/spu/include -I../../src -I./SpuNarrowPhaseCollisionTask -o ./out/btAlignedAllocator.o ../../src/LinearMath/btAlignedAllocator.cpp
ccache /usr/bin/spu-gcc -o ./out/spuCollision.elf \
./out/SpuTaskFile.o \
./out/SpuFakeDma.o \
./out/SpuContactManifoldCollisionAlgorithm_spu.o \
./out/SpuContactResult.o \
./out/SpuCollisionShapes.o \
./out/SpuGjkPairDetector.o \
./out/SpuMinkowskiPenetrationDepthSolver.o \
./out/SpuVoronoiSimplexSolver.o \
./out/btPersistentManifold.o \
./out/btTriangleCallback.o \
./out/btTriangleIndexVertexArray.o \
./out/btStridingMeshInterface.o \
./out/btAlignedAllocator.o \
-Wl,-N -lstdc++
./out/SpuTaskFile.o: In function `SpuConvexConcaveCollisionAlgorithm(SpuCollisionPairInput&, CollisionTask_LocalStoreMemory&, SpuContactResult&, SpuInternalShape*, SpuInternalShape*, SpuInternalConvexHull*, SpuInternalConvexHull*)':
SpuGatheringCollisionTask.cpp:(.text+0x37c): undefined reference to `btConvexPointCloudShape::btConvexPointCloudShape(btVector3*, int)'
SpuGatheringCollisionTask.cpp:(.text+0x3fc): undefined reference to `btConvexPointCloudShape::btConvexPointCloudShape(btVector3*, int)'
SpuGatheringCollisionTask.cpp:(.text+0x6d4): undefined reference to `btConvexShape::getAabbNonVirtual(btTransform const&, btVector3&, btVector3&) const'
SpuGatheringCollisionTask.cpp:(.text+0x7dc): undefined reference to `btConvexShape::getAabbNonVirtual(btTransform const&, btVector3&, btVector3&) const'
SpuGatheringCollisionTask.cpp:(.text+0xa48): undefined reference to `btConvexPointCloudShape::setPoints(btVector3*, int)'
./out/SpuTaskFile.o: In function `SpuConvexConvexCollisionAlgorithm(SpuCollisionPairInput&, CollisionTask_LocalStoreMemory&, SpuContactResult&, SpuInternalShape*, SpuInternalShape*, SpuInternalConvexHull*, SpuInternalConvexHull*)':
SpuGatheringCollisionTask.cpp:(.text+0xcbc): undefined reference to `btConvexPointCloudShape::btConvexPointCloudShape(btVector3*, int)'
SpuGatheringCollisionTask.cpp:(.text+0xd34): undefined reference to `btConvexPointCloudShape::btConvexPointCloudShape(btVector3*, int)'
SpuGatheringCollisionTask.cpp:(.text+0xd44): undefined reference to `btConvexPointCloudShape::btConvexPointCloudShape(btVector3*, int)'
SpuGatheringCollisionTask.cpp:(.text+0xf40): undefined reference to `vtable for btMinkowskiPenetrationDepthSolver'
SpuGatheringCollisionTask.cpp:(.text+0x1070): undefined reference to `vtable for btGjkEpaPenetrationDepthSolver'
SpuGatheringCollisionTask.cpp:(.text+0x1154): undefined reference to `btGjkPairDetector::btGjkPairDetector(btConvexShape const*, btConvexShape const*, btVoronoiSimplexSolver*, btConvexPenetrationDepthSolver*)'
SpuGatheringCollisionTask.cpp:(.text+0x116c): undefined reference to `btGjkPairDetector::getClosestPoints(btDiscreteCollisionDetectorInterface::ClosestPointInput const&, btDiscreteCollisionDetectorInterface::Result&, btIDebugDraw*, bool)'
SpuGatheringCollisionTask.cpp:(.text+0x11c8): undefined reference to `btConvexPointCloudShape::setPoints(btVector3*, int)'
SpuGatheringCollisionTask.cpp:(.text+0x1200): undefined reference to `btConvexPointCloudShape::setPoints(btVector3*, int)'
./out/SpuTaskFile.o: In function `btConvexShape::~btConvexShape()':
SpuGatheringCollisionTask.cpp:(.text._ZN13btConvexShapeD0Ev[btConvexShape::~btConvexShape()]+0x10): undefined reference to `vtable for btCollisionShape'
./out/SpuTaskFile.o: In function `btConvexShape::~btConvexShape()':
SpuGatheringCollisionTask.cpp:(.text._ZN13btConvexShapeD1Ev[btConvexShape::~btConvexShape()]+0x10): undefined reference to `vtable for btCollisionShape'
./out/SpuTaskFile.o: In function `btTriangleShape::~btTriangleShape()':
SpuGatheringCollisionTask.cpp:(.text._ZN15btTriangleShapeD1Ev[btTriangleShape::~btTriangleShape()]+0x8): undefined reference to `vtable for btPolyhedralConvexShape'
SpuGatheringCollisionTask.cpp:(.text._ZN15btTriangleShapeD1Ev[btTriangleShape::~btTriangleShape()]+0x10): undefined reference to `vtable for btConvexInternalShape'
SpuGatheringCollisionTask.cpp:(.text._ZN15btTriangleShapeD1Ev[btTriangleShape::~btTriangleShape()]+0x1c): undefined reference to `vtable for btCollisionShape'
./out/SpuTaskFile.o: In function `btTriangleShape::~btTriangleShape()':
SpuGatheringCollisionTask.cpp:(.text._ZN15btTriangleShapeD0Ev[btTriangleShape::~btTriangleShape()]+0x8): undefined reference to `vtable for btPolyhedralConvexShape'
SpuGatheringCollisionTask.cpp:(.text._ZN15btTriangleShapeD0Ev[btTriangleShape::~btTriangleShape()]+0x10): undefined reference to `vtable for btConvexInternalShape'
SpuGatheringCollisionTask.cpp:(.text._ZN15btTriangleShapeD0Ev[btTriangleShape::~btTriangleShape()]+0x1c): undefined reference to `vtable for btCollisionShape'
./out/SpuTaskFile.o: In function `spuNodeCallback::processNode(int, int)':
SpuGatheringCollisionTask.cpp:(.text._ZN15spuNodeCallback11processNodeEii[spuNodeCallback::processNode(int, int)]+0x29c): undefined reference to `btPolyhedralConvexShape::btPolyhedralConvexShape()'
./out/SpuTaskFile.o:(.rodata._ZTV15btTriangleShape[vtable for btTriangleShape]+0x14): undefined reference to `btCollisionShape::getBoundingSphere(btVector3&, float&) const'
./out/SpuTaskFile.o:(.rodata._ZTV15btTriangleShape[vtable for btTriangleShape]+0x18): undefined reference to `btCollisionShape::getAngularMotionDisc() const'
./out/SpuTaskFile.o:(.rodata._ZTV15btTriangleShape[vtable for btTriangleShape]+0x24): undefined reference to `btConvexInternalShape::localGetSupportingVertex(btVector3 const&) const'
./out/SpuTaskFile.o:(.rodata._ZTV15btTriangleShape[vtable for btTriangleShape]+0x28): undefined reference to `btConvexInternalShape::getAabbSlow(btTransform const&, btVector3&, btVector3&) const'
./out/SpuTaskFile.o:(.rodata._ZTV15btTriangleShape[vtable for btTriangleShape]+0x2c): undefined reference to `btPolyhedralConvexShape::setLocalScaling(btVector3 const&)'
./out/SpuTaskFile.o:(.rodata._ZTV13btConvexShape[vtable for btConvexShape]+0x14): undefined reference to `btCollisionShape::getBoundingSphere(btVector3&, float&) const'
./out/SpuTaskFile.o:(.rodata._ZTV13btConvexShape[vtable for btConvexShape]+0x18): undefined reference to `btCollisionShape::getAngularMotionDisc() const'
./out/SpuGjkPairDetector.o: In function `SpuGjkPairDetector::getClosestPoints(SpuClosestPointInput const&, SpuContactResult&)':
SpuGjkPairDetector.cpp:(.text+0x6cc): undefined reference to `btVoronoiSimplexSolver::reset()'
SpuGjkPairDetector.cpp:(.text+0x768): undefined reference to `btVoronoiSimplexSolver::reset()'
SpuGjkPairDetector.cpp:(.text+0xa8c): undefined reference to `btVoronoiSimplexSolver::inSimplex(btVector3 const&)'
SpuGjkPairDetector.cpp:(.text+0xb30): undefined reference to `btVoronoiSimplexSolver::inSimplex(btVector3 const&)'
SpuGjkPairDetector.cpp:(.text+0xb34): undefined reference to `btVoronoiSimplexSolver::addVertex(btVector3 const&, btVector3 const&, btVector3 const&)'
SpuGjkPairDetector.cpp:(.text+0xb74): undefined reference to `btVoronoiSimplexSolver::addVertex(btVector3 const&, btVector3 const&, btVector3 const&)'
SpuGjkPairDetector.cpp:(.text+0xb84): undefined reference to `btVoronoiSimplexSolver::closest(btVector3&)'
SpuGjkPairDetector.cpp:(.text+0xc38): undefined reference to `btVoronoiSimplexSolver::backup_closest(btVector3&)'
SpuGjkPairDetector.cpp:(.text+0x1024): undefined reference to `btVoronoiSimplexSolver::compute_points(btVector3&, btVector3&)'
SpuGjkPairDetector.cpp:(.text+0x1054): undefined reference to `btVoronoiSimplexSolver::compute_points(btVector3&, btVector3&)'
SpuGjkPairDetector.cpp:(.text+0x13b0): undefined reference to `btVoronoiSimplexSolver::backup_closest(btVector3&)'
collect2: ld returned 1 exit status
make: *** [spu] Error 1


Sorry for all of the text. Just trying to be thorough. My blade is running linux. I'm using 'make' with the file 'Makefile.original'. There is also a Jamfile, but I don't have jam installed. Thanks for any help.

Daniel
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Compilation on a Cell blade

Post by Erwin Coumans »

Some files need to be added to the Makefile, we haven't kept the IBM Cell SDK makefile up-to-date.

Currently, we are optimizing BulletMultiThreaded and moved it to Bullet/src folder (from Bullet/Extras), so please wait until we finish this work.

We will fix the Makefile for Bullet 2.73, see:

http://code.google.com/p/bullet/issues/detail?id=112
danieltracy
Posts: 13
Joined: Mon Oct 13, 2008 11:01 pm

Re: Compilation on a Cell blade

Post by danieltracy »

Thank you for the response. I have two follow-up questions.

1. Is there a previous version of Bullet for which the IBM SDK is not broken so I could begin the work? Is it available anywhere?

2. Is there an ETA for 2.73?

Daniel
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Compilation on a Cell blade

Post by Erwin Coumans »

danieltracy wrote:Thank you for the response. I have two follow-up questions.

1. Is there a previous version of Bullet for which the IBM SDK is not broken so I could begin the work? Is it available anywhere?

2. Is there an ETA for 2.73?

Daniel
We try to keep monthly releases, and the target for Bullet 2.73 is next week.

Have you tried adding the missing files to the IBM Makefiles? Is this for IBM Cell SDK 3.0?
There are some older Bullet versions that could compile with IBM Cell SDK 2.x, check out Bullet 2.66 http://bulletphysics.com/Bullet/phpBB3/ ... =18&t=1783

Thanks,
Erwin
danieltracy
Posts: 13
Joined: Mon Oct 13, 2008 11:01 pm

Re: Compilation on a Cell blade

Post by danieltracy »

I have not tried adding or modifying files to 2.72, and we are using SDK 3.0. Is there a version of Bullet we could start with?

Daniel
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Compilation on a Cell blade

Post by Erwin Coumans »

Now the current trunk compiles on IBM Cell SDK 3.1 (tested).

Can you help testing the latest trunk (r1444 alpha package or later) on googlecode?
http://code.google.com/p/bullet/downloads/list

Build instructions:
cd src/ibm_sdk
make
cd ../BulletMultiThreaded
make -f Makefile.original spu
make -f Makefile.original ppu
cd ..\..\Demos\CellSpuDemo\ibm_sdk
make

This CellSpuDemo doesn't require OpenGL/Glut so it is easier to test on CellBlade etc. But there are some issues that will be fixed before release. Hope you can help testing it.

Thanks,
Erwin
Joczhen
Posts: 17
Joined: Thu Dec 20, 2007 3:17 pm

Re: Compilation on a Cell blade

Post by Joczhen »

Hi Erwin,

I can confirm that the CellSpuDemo is working on a QS22 Blade with Cell SDK 3.1

Attached you'll find some whitespace and style fixes for this demo. I'm not sure what coding style you use in Bullet but in this case everything was non consistent. I changed it to the kernel coding style now. Feel free to discard the patch.
Update: I'm not able to upload the patch, neither .diff nor .patch nor .txt are allowed file name extensions. How shall I upload the patch here?

I'll post a patch later on which makes this demo (CellSpuDemo) more useful.
danieltracy
Posts: 13
Joined: Mon Oct 13, 2008 11:01 pm

Re: Compilation on a Cell blade

Post by danieltracy »

It does indeed compile on a QS22 blade. However, the BasicDemo2 executable output indicates:

num manifolds: 0
num gOverlappingPairs: 0
num gTotalContactPoints: 0

consistently for 1000 iterations. Either a run-time problem or not a well set up test?

I'd like to ask which subsystems are SPU-accelerated? Specifically among: the broadphase,
mid-phase, and narrow phase of CD, and integration. We're using OBBs and a trimesh
(ConcaveConvex algorithm), and are also considering the possibility of using convex hulls
and a trimesh instead. Also, how is expected performance as compared to an Intel Core 2
Duo running a Bullet simulation?

Daniel
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Compilation on a Cell blade

Post by Erwin Coumans »

danieltracy wrote:It does indeed compile on a QS22 blade. However, the BasicDemo2 executable output indicates:

num manifolds: 0
num gOverlappingPairs: 0
num gTotalContactPoints: 0

consistently for 1000 iterations. Either a run-time problem or not a well set up test?
There is an issue with the libspe2 version when calling "spe_image_open", it needs to be sorted and I'm looking into this right now. The SPURS and other versions don't have this. Also, when replacing the btAxisSweep3 broadphase by btDbvtBroadphase it seems to be fine. Can you confirm?
I'd like to ask which subsystems are SPU-accelerated? Specifically among: the broadphase,
mid-phase, and narrow phase of CD, and integration. We're using OBBs and a trimesh
(ConcaveConvex algorithm), and are also considering the possibility of using convex hulls
and a trimesh instead.
Midphase and Narrowphase collision detection for box, sphere, cylinder, capsule, convex hull, static triangle mesh are supported on SPU. Also the constraint solver. Those are usually the main bottlenecks. The libspe2 version hasn't gotten much attention yet. We have several more SPU optimizations that haven't been open sourced yet.
Jochen wrote: Update: I'm not able to upload the patch, neither .diff nor .patch nor .txt are allowed file name extensions. How shall I upload the patch here?

I'll post a patch later on which makes this demo (CellSpuDemo) more useful.
Thanks Jochen. Patches go in the Bullet Google Code issue tracker. But let me first rename this 'BasicDemo2.cpp into CellSpuDemo.cpp, and move the benchmarks into there (see Demos/BenchmarkDemo). Using those benchmarks it will be easier to compare / measure performance between SPU, PPU and other platforms. Also I'm going to add IFF binary file reading/writing, which would make it easier to visualize/test output on other platforms.

Could anyone help providing a basic SPU rasterizer, so we can visualize the results? Gallium3D seems a bit of overkill, just a lightweight SPU renderer with texture mapping (we don't need shaders etc). Perhaps an intern could port a small software renderer from trenki.net to SPU?

Thanks for the feedback,
Erwin
danieltracy
Posts: 13
Joined: Mon Oct 13, 2008 11:01 pm

Re: Compilation on a Cell blade

Post by danieltracy »

When replacing sweep and prune with the dynamic AABB tree, the
results do indeed change. But it's consistently (for all 1000 cycles):

num manifolds: 239
num gOverlappingPairs: 0
num gTotalContactPoints : 0

Daniel
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Compilation on a Cell blade

Post by Erwin Coumans »

It seems an old compiler issue, aligned btVector3 doesn't work using the IBM Cell SDK. I reported it on the IBM forum, hope someone can clarify what is wrong.

See attached stripped down reproduction case: m_worldAabbMin/m_worldAabbMax doesn't have the right value.

Can you try to replace (Bullet/LinearMath/btQuadWorld.h)

Code: Select all

/**@brief The btQuadWordStorage class is base class for btVector3 and btQuaternion. 
 * Some issues under PS3 Linux with IBM 2.1 SDK, gcc compiler prevent from using aligned quadword.
 */
ATTRIBUTE_ALIGNED16(class) btQuadWordStorage
{
by

Code: Select all

class btQuadWordStorage
{
Hope that helps,
Erwin
You do not have the required permissions to view the files attached to this post.
Joczhen
Posts: 17
Joined: Thu Dec 20, 2007 3:17 pm

Re: Compilation on a Cell blade

Post by Joczhen »

Erwin Coumans wrote:It seems an old compiler issue, aligned btVector3 doesn't work using the IBM Cell SDK. I reported it on the IBM forum, hope someone can clarify what is wrong.
Yes, I'm having the same problem with the ppu-g++ from SDK 3.1.
I opened an internal Bugzilla and I'll take care of that.

I also verified that it works with g++ provided by Fedora 9.
Erwin Coumans wrote: Thanks Jochen. Patches go in the Bullet Google Code issue tracker. But let me first rename this 'BasicDemo2.cpp into CellSpuDemo.cpp, and move the benchmarks into there (see Demos/BenchmarkDemo). Using those benchmarks it will be easier to compare / measure performance between SPU, PPU and other platforms. Also I'm going to add IFF binary file reading/writing, which would make it easier to visualize/test output on other platforms.
OK, I'll wait for you to do that.
I wrote an additional patch which allows to pass the number of objects and loops at runtime. Another parameter sets the verbosity of the testcase.

Another thing I'd love to do is to enable / disable the graphical output by passing a command line parameter. In this case we could run benchmarks (without graphics) and also verify that the benchmark is doing the right thing (by turning graphics on) ;-)
Erwin Coumans wrote: Could anyone help providing a basic SPU rasterizer, so we can visualize the results? Gallium3D seems a bit of overkill, just a lightweight SPU renderer with texture mapping (we don't need shaders etc). Perhaps an intern could port a small software renderer from trenki.net to SPU?
I'd love to do that. Unfortunately I have to wait for the company to approve the contribution to that project. I looked at trenki.net and it looked really good.
But there might someone else out there who can help us?!
Joczhen
Posts: 17
Joined: Thu Dec 20, 2007 3:17 pm

Re: Compilation on a Cell blade

Post by Joczhen »

Joczhen wrote:
Erwin Coumans wrote:It seems an old compiler issue, aligned btVector3 doesn't work using the IBM Cell SDK. I reported it on the IBM forum, hope someone can clarify what is wrong.
Yes, I'm having the same problem with the ppu-g++ from SDK 3.1.
I opened an internal Bugzilla and I'll take care of that.

I also verified that it works with g++ provided by Fedora 9.
OK, it looks like alignment of stack variables isn't fully supported with the 4.1.1 gcc compiler. The 4.1.1 system gcc of RHEL 5.2 is not working as well.
You'll find the detailed reason here: IBM Forum
One solution might be to use the ppu-gcc43-c++ instead which also comes with the Cell SDK 3.1
yum install ppu-gcc43
Joczhen
Posts: 17
Joined: Thu Dec 20, 2007 3:17 pm

Re: Compilation on a Cell blade

Post by Joczhen »

Joczhen wrote:
Erwin Coumans wrote: Thanks Jochen. Patches go in the Bullet Google Code issue tracker. But let me first rename this 'BasicDemo2.cpp into CellSpuDemo.cpp, and move the benchmarks into there (see Demos/BenchmarkDemo). Using those benchmarks it will be easier to compare / measure performance between SPU, PPU and other platforms. Also I'm going to add IFF binary file reading/writing, which would make it easier to visualize/test output on other platforms.
OK, I'll wait for you to do that.
I wrote an additional patch which allows to pass the number of objects and loops at runtime. Another parameter sets the verbosity of the testcase.

Another thing I'd love to do is to enable / disable the graphical output by passing a command line parameter. In this case we could run benchmarks (without graphics) and also verify that the benchmark is doing the right thing (by turning graphics on) ;-)
Do you want to integrate the benchmarks from Demos/BenchmarkDemo by your own? Do you need help on that?
How can I help you to move forward with this topic?
danieltracy
Posts: 13
Joined: Mon Oct 13, 2008 11:01 pm

Re: Compilation on a Cell blade

Post by danieltracy »

Erwin Coumans wrote:Can you try to replace (Bullet/LinearMath/btQuadWorld.h)

Code: Select all

/**@brief The btQuadWordStorage class is base class for btVector3 and btQuaternion. 
 * Some issues under PS3 Linux with IBM 2.1 SDK, gcc compiler prevent from using aligned quadword.
 */
ATTRIBUTE_ALIGNED16(class) btQuadWordStorage
{
by

Code: Select all

class btQuadWordStorage
{
Result is segmentation fault. Presumably unaligned memory access.

Daniel