The biggest problem is that, say I have 4000 objects in my sim, and I want to do anything on a SINGLE object, be it rip a transform, apply an impulse, whatnot.
The issue then becomes that the system only allows so many semaphores, so we end up defining a single locking semaphore, but that means if I want to modify a SINGLE object, I stall the ENTIRE physics simulator because that mutex is for ALL objects.
Now, say I am in a render pass in my main sim ripping transforms, I am competing directly between physics and graphics semaphore-wise - the alternative is to do batch reads/writes.
That might be the best way is to make a single lock for batch operations instead of atomic mutexes, as that way there is less competition and less stalling - during a phase where I am requesting graphics information and/or applying operations to objects, the physics simulator should be stopped, not just stalled.
EDIT - I'll look at the code tonight, but I'm at work tonight and don't have access to a dev system, and I'm here for the next 9 hours

so I'm just trying to get as much info into my head now so I'm ready to just dig in.