Erwin Coumans wrote:
That is amazingly rapid prototyping!
Could you please make it Zlib license? Then it is more compatible with Bullet, and it allows application in closed source/commercial/console games.
OK - I'll read the zlib license - but it'll probably be OK.
Well, for effect/eye candy physics, that is already enough. We could have a shader that allows some interaction with objects. Havok had some whirl-wind effect that looked amazing with the tons of objects.
Yeah - but particle system approaches are fine if there is no interaction between particles or between particles and the world. We already do this stuff at work (I work in flight simulation - not games - but it's hard to tell the difference sometimes!) - if the position of every particle follows a 100% predictable path then you can compute it's position without knowing any past history and you're better off just using 's = ut + 1/2 a t^2" for each particle each frame and just feeding the time into the shader. Even whirlwind type effects have an equation that will tell you the position of every particle from the time and the initial conditions - so you don't need an incremental approach that demands all of this sophistication. We only need it here because the particles are bouncing off the ground (that's why I included that in the demo!).
I will work on the broadphase very soon, creating a 2d bitmap, encoding overlap between two objects. Also, I will spend some time on a parallel sphere-sphere implementation.
Certainly if we wished to test ONE sphere (or cubeoid or triangle) against a huge number of other spheres (or cubeoids or triangles) - that could easily be done this way. But to test a huge number of spheres simultaneously against a huge number of spheres will require some thought and some cleverness or it'll require one polygon draw per sphere. For 16,000 spheres tested against 16,000 spheres using mindless brute force, we'd need to draw 16,000 polygons at 128x128 pixels each. That would take a significant amount of GPU time - but it's certainly do-able within a few milliseconds I think.
The interesting question is whether we could do something like classify which spheres were in which 'grid cell' of the world volume in one pass (yes, that's doable) and then only test the spheres from that grid cell against the others in that grid cell...something like that might speed the algorithm up by many orders of magnitude and stll keep it mainlin inside the GPU. That's the kind of approach I think might work - but I'm no collision detection expert.
Thanks. What requirements does it have? Shader model 3?
I'm not really up on the 'Shader model' thing (that's a Microsoft/DirectX kind of thing and I'm strictly a Linux/OpenGL guy). The two specialised features it requires right now are the ability to render to floating point textures and the ability to read floating point textures into both the VERTEX shader and the fragment shader. Both of those are required for a strictly legal GLSL implementation - but older hardware will probably either fall back on software emulation or simply refuse to do it or drop the precision of the float down to a byte or something horrible.
This is definitely advanced stuff - it's not going to work on nVIdia 5xxx cards.
Just modified the code so it compile under Mac OS X. It runs, but asserted in line 60 of fboSupport.cxx. The status is 36054 (didn't check the enum). After commenting that out, it failed later, see below.
OK - you must have added some lines at the top. That would be at the end of the "checkFrameBufferStatus" function? A fail there probably means that render to floating point texture is unsupported. That would be "A Bad Thing" - do you have the latest OpenGL drivers installed?
erwin-coumans-computer:~/Downloads/GPUphysics apple$ ./GPU_physics_demo
status : 36054
Compiling:CollisionGenerator Frag Shader -
ERROR: 0:1: '<' : wrong operand types no operation '<' exists that takes a left-hand operand of type 'float' and a right operand of type 'const int' (or there is no acceptable conversion)
OK - that's my bad. On line 191 of GPU_physics_demo.cxx, change '< 0' to '< 0.0'.
I suppose I need to re-insert my freezing nvidia 6800 GT card. Is there another stable nvidia card that works on older AGP slots?
So framebuffer objects is one way of doing GPU shaders. Is there an alternative?
Short Answer: No.
Long Answer: Well, maybe...but I wouldn't try it. If we can't render into a three or four component floating point texture, it's hard to see how else we'd get data out of one pass of the shader and into the next. At work, we played briefly with splitting apart a 'float' and repacking it into four bytes and putting that into the RGBA of the frame buffer - then copying the frame buffer into a texture adn recombining the four bytes into a float in the next shader pass. However, that means that you can only output one floating point number in each shader pass - so things like calculating a rotation quaternion takes four shader passes instead of one - repeating the same calculations each pass but writing out a different byte each time. It's also slow because of copying frame buffer into texture...but we're talking pretty small textures here (128x128 is tiny - and yet gives you 16,000 cubes bouncing around!)...the worst part is that OpenGL only GUARANTEES to support up to 16 texture inputs into a shader. If each floating point number is one texture (instead of having up to four floats in a single texture) - then each step of the physics math can have only 16 floating point numbers going into it instead of 64 if you can use floating point texture targets. Worse still, the packing and unpacking of floating point numbers into RGBA colour bytes is nasty inside a shader (you can't just cast a pointer to a float into a pointer into four bytes because there aren't any pointers in shader-land)....so it has to be done arithmetically - which is insanely painful and will probably grind out performance into the dust.
It's possible that older hardware might be able to do this at 'half-float' precision. My code doesn't support that - but it could be made to do so. I just have doubts that you can do useful physics at 16 bit precision...heck - there are enough limits on object sizes and such already - if we make it any worse, it'll be unusuable.
Like I said - this is really a high-end technique. It's going to be a while before most of the hardware out there can support it. But that's also true of the hardware-physics-engine and both of the other physics-on-GPU solutions. At least we don't demand a second graphics card for running the physics like the nVidia solution does!
I think the plan here has to be to get it working so that it's mature code by the time the hardware catches up.