Erwin and I chatted a bit in the tail end of the Issue below, and he suggested that we continue in the Bullet public forum. The best way to acquire context is reading the Issue:
http://code.google.com/p/bullet/issues/detail?id=128"The Bullet broadphase have a addPair and removePair. When adding a pair that involves a 'sleeping' proxy, the sleeping object will be activated, the proxy type changes, and an actual pair will be generated. So there is no delay/cycle skipped, all constraints are fine."
Ah! That was what I was missing: Bullet activates bodies when a broad phase overlap occurs. Does it also form contact groups from broad phase overlaps? Our system only activates bodies after narrow phase confirms contact with an active body, and forms contact groups from narrow phase contacts. I consider that to be an advantage in many cases. Our environment has a lot of scattered objects with somewhat high density at times.
"Sleeping objects only consumed the memory of a world transform."
Exactly. And since Bullet is already state-aware, I think this should be internal to Bullet and not require developer modification if there's no performance compromise. Four floats per resting object is a beautiful thing (I assume you stored as quaternion since it's not being used continuously). Of course, you typically also need broad phase proxy, collision shape pointer, and such.
"I got rid of almost all non-active rigid body traversals, except for one, in the island generation. This can also be removed, by storing the 'active' flag in the overlapping pairs (at creation), and incrementally updating this flag when needed."
What we do is use internal object state as a partial indication of whether it's been added to the disjoint subsets structure yet. We start by adding all active bodies. Then we traverse narrow phase output: if a body is still resting, activate it and it to disjoint subset. If a body is still resting, it isn't there yet. So no tags.
"In the optimized case, you still need to traverse over all overlapping pairs. It seems best to separate 'active' overlapping pairs and 'non-active' overlapping pairs, so the narrowphase only processes the 'active' overlapping pairs (having at least one active body)."
We never traverse all overlapping pairs. Our event-based output broad phase is used to maintain an overlap graph (hash table). Then to feed the narrow phase, we traverse the set of active bodies and look up their overlaps in the graph. We do not need multiple graphs.
"Of course, in the worst case, all objects are connected and sleeping and a single object wakes up: in that case, you need to traverse all."
That's a pretty bad worst case. Activating it can be made fast, but in any case you're now dealing with all objects in the simulation being active and part of a single contact group. There's no cure for that but your new parallel-constraint-solving-within-an-island stuff I haven't looked at yet.
"By the way, the performance dropped from 25ms to 4ms when creating 40000 sleeping boxes (non-overlapping) in the air. I could cleanup the code and move it in trunk if there is enough interest."
That's very good. Some comments:
1. A better test would be to keep a few random objects alive and moving around in the system with mostly resting bodies.
2. What broad phase are you using in this test? MultiSAP?
3. Where is the overhead now? Contact group (island) generation?
4. I just did a quick test on our system with 240,000 objects (all resting on trimeshes, massive overlaps): 0.75 ms
But again, that's all "overhead", and could be eliminated with some early-outs on empty data. A more useful test would include some moving objects in various places.