Physics Simulation Forum

 

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: bullet, bsp, performance
PostPosted: Sat Oct 08, 2011 2:31 pm 
Offline

Joined: Sat Oct 08, 2011 1:00 pm
Posts: 8
Hi all.
I'm trying to make an android game with nice physics inside of closed environments, so I'd chose to use quake 2 bsp as a base for my levels.
First I was using old Tokamak physics engine, which is pretty nice. I used its' terrain callback methods with lookup for closest bsp leaves to generate indices the tokamak should use for collision computations. This lead me to pretty nice and fast rigid box collisions.
But this method had few limitations:
1. I didn't compute cw/ccw triangle indices so sphere collisions had holes with some polygons (I think tokamak uses the directions to compute normals and then use this normals for sphere test) :)
2. The terrain callback uses triangle tests without ccd and the outside of the world isn't solid this means on low fps (happend while I were optimizing renderer for android) I had boxes running through walls.
3. Tokamak had no character-controller\cloth\etc.
Thats why I decided to give bullet a chance.
So the compilation is pretty simple and I got it working on android and PC.
Then I found the bspdemo inside bullet sdk which made me happy 'cuz hell yeah - that's everything I need and more because no triangle computations needed, only original quake brushes converted with bullet into physical shapes!
But... the performance is SO BAD even on an empty room.
I throw around 30 cubes and get 30fps until they calm down and became static...
With Tokamak I could spam over 2k cubes on pretty huge level with high performance even if they don't calm down and even thought tokamak-based code was doing the physics on per-triangle base! Certanly for each box and its AABB the triangle-list ws regenerated for closest bsp leaves' triangles!
Any ideas how to make something like that on bullets' convexes or anything else?
Could it be that I'm using some wrong initialization?
The convex generation from quake brushes are based on bspdemo, the init is the simplest ever could be:
Code:
m_collisionConfiguration = new btDefaultCollisionConfiguration();
m_dispatcher = new btCollisionDispatcher(m_collisionConfiguration);
m_broadphase = new btDbvtBroadphase();
m_solver = new btSequentialImpulseConstraintSolver;
m_dynamicsWorld = new btDiscreteDynamicsWorld(m_dispatcher, m_broadphase, m_solver, m_collisionConfiguration);
m_dynamicsWorld->setGravity(btVector3(0,-10,0));


I can't get tokamak speed (my video) result :(
With bullet I get 20 fps @ 50-100 cubes on same scene... this is ridiculous on core 2 cpu... I don't even think to try that on android...
Any ideas on optimizations (take bspdemo from sdk as a base - all brushes fed into bullet and processed by bullet internals)? How can I feed bullet with convex static objects in some range of closest-leafs for each rigid object?
And yes the game won't contain huge amount of rigids, but this is pretty like an example of the speed measurement.

PS: The performance loss is right in m_dynamicsWorld->stepSimulation(...); call. With high-geometry level I get low fps even if no cubes spawned.
Cube spawning:
Code:
btRigidBody *addCube(const btVector3 & origin, float size = cubeScale, float mass = 10.0f) {
   btCollisionShape* colShape = new btBoxShape(btVector3(size, size, size));
   m_collisionShapes.push_back(colShape);

   btTransform startTransform;
   startTransform.setIdentity();
   startTransform.setOrigin(origin);

   btVector3 localInertia(0,0,0);
   colShape->calculateLocalInertia(mass, localInertia);

   btDefaultMotionState* myMotionState = new btDefaultMotionState(startTransform);
   btRigidBody::btRigidBodyConstructionInfo rbInfo(mass, myMotionState, colShape, localInertia);

   btRigidBody* body = new btRigidBody(rbInfo);
   m_rigidBodies.push_back(body);
   m_dynamicsWorld->addRigidBody(body);

   return body;
}


The part of bspdemo code for bsp brush conversion. addConvexVerticesCollider is called for each brush in leaf at loading time:
Code:
btRigidBody *localCreateRigidBody(float mass, const btTransform & startTransform, btCollisionShape *shape) {
   bool isDynamic = (mass != 0.f);

   btVector3 localInertia(0,0,0);
   if (isDynamic)
      shape->calculateLocalInertia(mass,localInertia);
#define USE_MOTIONSTATE 1
#ifdef USE_MOTIONSTATE
   btDefaultMotionState* myMotionState = new btDefaultMotionState(startTransform);
   btRigidBody::btRigidBodyConstructionInfo cInfo(mass,myMotionState,shape,localInertia);
   btRigidBody* body = new btRigidBody(cInfo);
//   body->setContactProcessingThreshold(m_defaultContactProcessingThreshold);
#else
   btRigidBody* body = new btRigidBody(mass,0,shape,localInertia);   
   body->setWorldTransform(startTransform);
#endif//
   m_dynamicsWorld->addRigidBody(body);
   return body;
}

void addConvexVerticesCollider(btAlignedObjectArray<btVector3> & vertices, bool isEntity, const btVector3 & entityTargetLocation)
{
   if (vertices.size() > 0)   {
      float mass = 0.f;
      btTransform startTransform;
      startTransform.setIdentity();
      btCollisionShape* shape = new btConvexHullShape(&(vertices[0].getX()),vertices.size());
      m_collisionShapes.push_back(shape);
      localCreateRigidBody(mass, startTransform, shape);
   }
}


Top
 Profile  
 
PostPosted: Tue Oct 11, 2011 1:38 am 
Offline

Joined: Tue Oct 11, 2011 1:37 am
Posts: 3
Are you sure you are compiling in "release" mode? This makes a big difference in the sample applications I have noticed.


Top
 Profile  
 
PostPosted: Tue Oct 11, 2011 12:02 pm 
Offline

Joined: Sat Oct 08, 2011 1:00 pm
Posts: 8
vspyder, hey, thanks for your answer.
Yep, I've been testing mostly debug version.
Then tried release as you suggested on PC - everything becomes much faster, but still there is a huge impact based on how much statics are added into bullet.

Here's some numbers (vsync off, c2d 2.9ghz, 2gb ram, gf8800):
bullet:
'base1' map, 5k bsp leaves:
debug, 27 fps, no cubes
debug, 17 fps, 67 cubes
release, 300 fps, no cubes
release, 180 fps, 67 cubes

'simple' map, 200 bsp leaves:
debug, 1000 fps, no cubes
debug, 20 fps, 67 cubes
release, 3000 fps, no cubes
release, 300 fps, 67 cubes

tokamak (vsync on, don't have any chance to test as much as bullet because commented it out and only one old build left):
release, 'base1' map, 5k bsp leaves:
no cubes: 60fps
100-1000 cubes (in one room, not calmed down and keeping to throw): still 60 fps (no impact at all)

I think there is something I just don't know about bullet which could optimize static environment processing,
because it's definitly doing a lot of computations even when there is no dynamic objects in scene.
This shows huge fps differences between big and small maps for bullet and a big slowdown when adding rigidbodies.
Feels like it does some computations for statics and moreover doesn't crop\cull them for different dynamic rigidbodies when I add them to the scene. I know there should be AABB tree and lookup must be fast... I probably missed something in init...

As for the speed of tokamak this is just because I send him only those bsp leaves which are close to current rigidbody, so using BSP I cull a LOT around cubes.
Any Idea how can same be done for bullet? Callbacks or what?
The reference manual is a bit complicated :)

PS:
some statistics from android and bullet (snapdragon 768mhz, adreno 200, not exact results, but around that values, can't really remember):
'simple' map, no cubes, 40 fps
'simple' map, 20 cubes, ~20-25 fps

'base1' map, no cubes, 18 fps
'base1' map, 10+ cubes, 10 fps and less
not too bad, but I'm sure we can get more.


Top
 Profile  
 
PostPosted: Fri Oct 14, 2011 5:53 pm 
Offline

Joined: Sat Oct 08, 2011 1:00 pm
Posts: 8
I've also noticed big slowdown at exit of my application when using big map in debug mode.
Code:
delete m_dynamicsWorld; // takes a lot of time
Pausing debugger shows pointer at:
Code:
void btHashedOverlappingPairCache::processAllOverlappingPairs(btOverlapCallback* callback,btDispatcher* dispatcher)
...with some huge values inside m_overlappingPairArray, not sure if it does any sense, but for me it looks really strange that there is so much overlapping pairs on a scene with 3 cubes and 1 capsule (dynamic objects) and a 5k of static convexhulls.
Well the statics probably able to overlap each other on bsp map but I'm not sure that bullet have to make pairs between statics. :)
Also for debug mode if I set some empty nearCallback:
Code:
void nearCallback(btBroadphasePair& collisionPair, btCollisionDispatcher& dispatcher, const btDispatcherInfo& dispatchInfo) {
   return;
}

...it does not affect execution speed at all... I mean I get same 17 fps even thought no collisions happening and all objects dropping through floor\walls.
In release it gives huge speedup.
This really have to be something with huge for-loops based on arrays of overlappingpairs (if they're really processed somewhere inside bullet) which makes everything laggy.

Still looking for a way to feed bullet's rigids only with their closest bsp brushes\triangles each frame as alternative.


Top
 Profile  
 
PostPosted: Fri Oct 14, 2011 10:08 pm 
Offline

Joined: Sat Oct 08, 2011 1:00 pm
Posts: 8
Okay, I was playing around and decided to make some experiments.
I've already read wiki and it says like btConvexHullShape is the fastest, okay I ignore that now and try btBvhTriangleMeshShape for making level statics just for fun.
So here is the pseudo-code (because it uses my own bsp structure which is converted for better mobile rendering, eg faces regrouped per texture so I have less texture state changes for each leaf)
Here I use geometry (indexed vertices) held in leaves which is used for rendering NOT the quake brushes (multiple planes generating single convexhull which are used in bspdemo)
Code:
void BSP_GenerateTRIMESHcolliders(BSP *bsp) {
   int numLeaves = bsp->leaves.size();
   for(int i = 0; i<numLeaves; i++) {
      BSPLeaf *leaf = &bsp->leaves[i];
      if(leaf->cluster == -1) // skip unreachable leafes
         continue;
      int numFaces = leaf->faces.size();
      if(numFaces < 1) // i have few empty seriously, skip them too
         continue;
      // create triangle mesh for this leaf
      btTriangleMesh *trimesh = new btTriangleMesh();
      BSPFace *faces = &leaf->faces[0];
      for(int j = 0; j<numFaces; j++) {
         BSPFace *face = &faces[j];
         // convert internal face data into bullet compatible data
         // bspface is a bit misscalled here - its' just indices into vertices,
         // forming multiple triangles in one face object, it's a group of vertices using one texture
         for(int k = 0; k<face->indices.size(); k+=3) {
            btVector3 v[3];
            for(int t = 0; t<3; t++) {
               point3f vert = bsp->vertices[face->indices[k + t]].position;
               v[t] = btVector3(vert.x, vert.y, vert.z);
            }
            // insert triangle
            trimesh->addTriangle(v[0], v[1], v[2]);
         }
      }
      // create shape and pass into bullet
      btBvhTriangleMeshShape *shape = new btBvhTriangleMeshShape(trimesh, true);
      btTransform startTransform = btTransform::getIdentity();
      localCreateRigidBody(0,startTransform, shape);   // you can find this in previous post or bspdemo in sdk
      // delete trimesh; // will crash later, do not delete, probably used later inside bullet, means the meshshape doesnt hold the trimesh data, just a pointer to it
   }
}

Map: 'base1' from quake 2, 3 rigid cubes and 1 capsule as a player. Vsync: off.
Results:
trimesh release: 700 fps, less than second to generate meshes.
convexhull release: 300 fps, around 1 second to generate hulls.

trimesh debug: 63 fps, 1 second to build meshes
convexhull debug: 23 fps, around 7 seconds to build hulls

Same map after spawning 150 cubes:
trimesh debug 12fps
convexhull debug 9fps
trimesh release 150fps
convexhull release 100fps

How can it be possible that trimesh is much faster at every point: faster generation, faster collision detection.
But as a result there could be a possibility of objects penetrating triangles (still didn't noticed, dunno if it's bulletproof though).
Still can't reach 60 fps when having around 2k cubes :)
But for 500 cubes it doesn't drop lower than 60fps.

I'll try on android later. Currently busy porting some rendering features.
Also this doesn't really adds any optimization tricks and probably pretty memory intense, so the question is open and I'm still looking for ideas to solve that.

PS *little update to the post*
Some leaves have less than 10 triangles, so I also tried putting just every bsp face of all 'base1' map into one single huge trianglemesh.
This gave 5-10 fps more to speed, looks like internal aabb really works fine in this case.
I have no idea why generating hulls have unbelievable (on huge maps) huge impact on performance.


Top
 Profile  
 
PostPosted: Fri Oct 14, 2011 10:54 pm 
Offline
User avatar

Joined: Tue Jun 29, 2010 10:27 pm
Posts: 237
I think bvhTriangeMeshShapes are optimized for static use, while convex hulls are for general purpose dynamic/static/whatever. Maybe that could by why, alogn with a possibility that the generated convex hulls have a lot more points/faces than you might hope or think. Just some guesses.


Top
 Profile  
 
PostPosted: Sat Oct 15, 2011 9:19 am 
Offline

Joined: Thu Mar 04, 2010 6:55 pm
Posts: 9
Hi gogiii.

I've also found bullets speed for static geometry suboptimal.

A few other things I've found out:
- Set setForceUpdateAllAabbs(false); for your world.
- Set setActivationState(DISABLE_SIMULATION); for static collision objects.
- Try to keep the number of static objects small.
- Batch them into a single btBvhTriangleMeshShape if possible, btTriangleIndexVertexArray supports multiple indexed meshes(max speedup for me).

You can call CProfileManager::dumpAll(); after stepSimulation to get an idea where time is wasted. My old thread: viewtopic.php?f=9&t=6397


Top
 Profile  
 
PostPosted: Mon Oct 17, 2011 10:04 am 
Offline

Joined: Fri Jun 24, 2011 8:53 am
Posts: 149
NaN wrote:
- Batch them into a single btBvhTriangleMeshShape if possible, btTriangleIndexVertexArray supports multiple indexed meshes(max speedup for me).
As I'm considering to squezze some more perf, mind estimating the difference?
I'm considering mangling all the static meshes for graphics batching and perhaps I will do the same for the physics.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot], Majestic-12 [Bot] and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group