I have a brief question after reading Erwin's SIGGRAPH 2010 presentation on "optimizing proximity queries". I've just finished implementing a simple broadphase obb bounding volume hierarchy using OpenCL and I'm thinking about how to write the tree traversal kernel. What I am wondering about is how you would store the list of collision pairs generated by the intersection tests with leaf nodes during the traversal, considering that OpenCL/CUDA doesn't allow dynamic data structures.
In our prototyping work on CUDA and OpenCL, we pre-allocated a fixed number of pairs for each object (20 pairs for small objects, and more for large objects).
For a more general solution, you could use DirectCompute Append Buffers. Unfortunately, OpenCL doesn't have this feature, even though I regularly asked for this in the OpenCL working group. As a workaround, you could use global atomics, and increment an integer counter as index in the pair buffer.
With a pre-order traversal the number of possible intersections for each leaf node is far too large an amount to assign to a buffer. I'm also concerned about the lack of synchronization as each thread proceeds with the traversal and finishes work at different times. The breadth-first approach seems more promising, but it would be launching many threads without any work towards the bottom of the tree.
We are currently doing research in this area, but this is still preliminary unpublished work. Let's cross fingers we get some good results that we can share, later this year.