Degrading performance (binary demo and VIDEO included)

cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Degrading performance (binary demo and VIDEO included)

Post by cbuchner1 »

Hi,

UPDATE: find a video further below in sixth posting

upon creating a prototype coin pusher arcade machine based on OSG and Bullet Physics I've hit a serious performance limitation. I've attached a Win32 binary demonstrating the issue based on OpenGL. The binary depends on OSG 2.8.1 which was not included to stay below allowed size limits. The demo shows a pusher (box) moving forward and backwards and interacting with coins that are tossed into the scene once per second.

At about 60-65 coins in the simulation the frame rate drops as if there was no tomorrow. Before that it stays nicely at 60FPS but after that it degrades extremely rapidly - in fact so rapidly that I do not think this is normal scaling behavior for Bullet. The degradation also does not depend on the CPU - it's exactly the same for a slow Athlon MP compared to a Core 2 Duo.

I've also also made the discovery that sometimes for very brief periods, the frame rate jumped back to 60 FPS even on the slow machine with an Athlon MP processor. I tend to believe that there is some bug either in OSG or Bullet Physics that munches away all my CPU cycles - because the thing *can* run faster as I've observed. It appears I have to do some code profiling under Linux to understand where the CPU cycles are burned.

I've got another thread open on this project where I explain what my simulation assumptions and parameters are.
http://bulletphysics.com/Bullet/phpBB3/ ... f=9&t=3623

Any comments welcome,

Christian
You do not have the required permissions to view the files attached to this post.
Last edited by cbuchner1 on Sun Jun 21, 2009 12:49 am, edited 2 times in total.
pico
Posts: 229
Joined: Sun Sep 30, 2007 7:58 am

Re: Degrading performance (binary demo included)

Post by pico »

Hi,

i recommend to display to btProfilers. So you can easily spot where the performance goes.

Form your other post i saw you use a 1/200s timestep. Do you need such a high timestep? Did you already try 1/60s. This would give you instantely 3 times more performance.

Do you use a cylinder primitive or a convex shape for the coins? Is your bottom very highly tesselated?
cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Re: Degrading performance (binary demo included)

Post by cbuchner1 »

The 1/60s timestep resulted in a very bouncy and funny looking behavior of the coins on the ground. Only with 1/200s they would slowly spin to a stop nicely when falling on the face.

I am using a cylinder primitive in Bullet.

In OSG I am using a cylindrical shape with edge 32 faces, as exported from Blender. I assume the flat coin faces are already tesselated into triangles (because otherwise OpenGL would have to tesselate a 32-edged POLY, which is slow).
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Degrading performance (binary demo included)

Post by Erwin Coumans »

Bullet works best if units are in meters, and objects are in range 10 centimeter (bigger than a pebble) and smaller than 10 meter (a house).

Simulating coins with a side of 5 millimeter is possible, but it will require a lot of tweaking effort.

As mentioned before, stick with meter units and try to use a smaller fixed timestep (3rd argument to stepSimulation), smaller collision margin etc.
Thanks,
Erwin
cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Re: Degrading performance (binary demo included)

Post by cbuchner1 »

I will stick with the millimeter units for now (the coins have a height of 1.0 and diameter of 10.0), but I will try to tweak some margin and other parameter settings.

Also I am going to profile the code to see where it slows down so much.
cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Re: Degrading performance (binary demo included)

Post by cbuchner1 »

Here is a High Def Video on Youtube showing the penny pusher prototype. It was captured at constant 25 FPS, so some motion looks a bit jerky where in the original it was butter smooth (frame rates in the hundreds). This was recorded on a pretty new laptop with a 2.2 GHz Centrino Duo and nVidia 9600M GT graphics on Linux. VSYNC off.

http://www.youtube.com/watch?v=ubpZUTuO1ec&fmt=22

At about 2 minutes into the clip we hit a point where frame rates collapse catastrophically. It is just amazing how quickly those frames drop from the hundreds to a few tens to just below 10. I have some kind of frame rate control built in that slows down the simulation time to maintain 10 FPS - so things become slow motion when we run out of steam. ;-)

I think I have understood why the frame rate sometimes recovers briefly (not really seen in this video clip though). The coins in front of the pusher sometimes "settle", i.e. become inactive and that is when frame rates jump back.

Now I need about 2-3 times the performance I currently get regarding the total number of coins to create a good pusher arcade machine. So when will the CUDA solver be available to the general public?

Christian
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Degrading performance (binary demo and VIDEO included)

Post by Erwin Coumans »

Now I need about 2-3 times the performance I currently get regarding the total number of coins to create a good pusher arcade machine. So when will the CUDA solver be available to the general public?
CUDA isn't likely to help out, the setup needs to be fixed.

Please answer all the following questions:
  • 1) You test is performed in optimized (release/-O2) version right?
  • 2) Please report detailed profile timings during a slow frame, by adding the following line after stepSimulation call:

    Code: Select all

    #include "LinearMath/btQuickprof.h"
    
    stepSimulation(deltaTime);
    ///dumpAll prints detailed statistics of Bullet simulation into the standard output (console)
    CProfileManager::dumpAll();
    
  • 3) Provide a code snippet that show all the parameters you feed to 'stepSimulation' exactly.
  • 4) It could be that the amount of overlapping pairs is extreme, so this needs to be checked first:

    Code: Select all

    int numpairs = dynamicsWorld()->getBroadphase()->getOverlappingPairCache()->getNumOverlappingPairs();
    
Thanks,
Erwin
cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Re: Degrading performance (binary demo and VIDEO included)

Post by cbuchner1 »

I built both bullet and OSG in Release mode on OpenSuse 11.1

Here is the code that calls stepSimulation in this test.

Code: Select all

        // NOTE: dt is time elapsed since rendering last frame

        const float step = 0.005;  // 200 FPS physics
        const int iter = 20;       // 20 iterations per frame
                                   // -> 10 FPS desired frame rate

        // low frame rates < target_fps effectively slow down the simulation
        if (dt > iter * step)
        {
            // compute time slippage
            double diff = dt - iter * step;
            dt        -= diff;
            dt_lost   += diff;
        }

        m_dynamicsWorld->stepSimulation(dt, iter, step);
I am about to post three snapshots of individual frames. The difference between the first and second snapshot is a factor of 10 in total running time. In the first snapshot each frame only computes 2 iterations - in the second snapshot it is 20 iterations.

The difference between the second and third snapshot is a factor of two in running time, which here nicely corresponds to the increase in number of colliding pairs, the maximum number of iterations of 20 is used.

So now I think I know why the frame rates drop so quickly and catastrophically. There is one "break-even" point where the elapsed wall clock time to compute a single physics iteration begins to take longer than the duration of this simulation time step. This is when the number of iterations processed per frame rapidly climbs to 20 (the allowed maximum). It causes the total run time per frame to jump into the 100ms range and the frame rates tank rapidly.

Bullet is not really to blame for the rapid degradation in frame rates. I simply created an unstable system, so I need to find a control loop that provides a smooth degradation of frame rates with increasing number of coins.

However anything that makes the constraint solver and the discrete collision detection faster (e.g. CUDA) would still help my application.

Code: Select all

Simulated time: 149.01 s
Total coins in the game: 149
Number of colliding pairs: 1013
CProfileManager::dumpAll() output:
----------------------------------
Profiling: Root (total running time: 10.101 ms) ---
0 -- stepSimulation (89.74 %) :: 9.065 ms / frame (1 calls)
Unaccounted: (10.256 %) :: 1.036 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 9.065 ms) ---
...0 -- synchronizeMotionStates (1.53 %) :: 0.139 ms / frame (3 calls)
...1 -- internalSingleStepSimulation (98.35 %) :: 8.915 ms / frame (2 calls)
...Unaccounted: (0.121 %) :: 0.011 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 8.915 ms) --
......0 -- updateActivationState (0.13 %) :: 0.012 ms / frame (2 calls)
......1 -- updateActions (0.02 %) :: 0.002 ms / frame (2 calls)
......2 -- integrateTransforms (1.51 %) :: 0.135 ms / frame (2 calls)
......3 -- solveConstraints (39.10 %) :: 3.486 ms / frame (2 calls)
......4 -- calculateSimulationIslands (0.90 %) :: 0.080 ms / frame (2 calls)
......5 -- performDiscreteCollisionDetection (56.23 %) :: 5.013 ms / frame (2 c
......6 -- predictUnconstraintMotion (1.92 %) :: 0.171 ms / frame (2 calls)
......Unaccounted: (0.179 %) :: 0.016 ms
.........----------------------------------
.........Profiling: integrateTransforms (total running time: 0.135 ms) ---
.........0 -- CCD motion clamping (25.93 %) :: 0.035 ms / frame (2 calls)
.........Unaccounted: (74.074 %) :: 0.100 ms
............----------------------------------
............Profiling: CCD motion clamping (total running time: 0.035 ms) ---
............0 -- convexSweepTest (85.71 %) :: 0.030 ms / frame (2 calls)
............Unaccounted: (14.286 %) :: 0.005 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 3.486 ms) ---
.........0 -- processIslands (97.53 %) :: 3.400 ms / frame (2 calls)
.........1 -- islandUnionFindAndQuickSort (2.29 %) :: 0.080 ms / frame (2 calls
.........Unaccounted: (0.172 %) :: 0.006 ms
............----------------------------------
............Profiling: processIslands (total running time: 3.400 ms) ---
............0 -- solveGroup (94.59 %) :: 3.216 ms / frame (4 calls)
............Unaccounted: (5.412 %) :: 0.184 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 3.216 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (68.25 %) :: 2.195 ms / f
...............1 -- solveGroupCacheFriendlySetup (29.23 %) :: 0.940 ms / frame
...............Unaccounted: (2.519 %) :: 0.081 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 5.01
.........0 -- dispatchAllCollisionPairs (96.31 %) :: 4.828 ms / frame (2 calls)
.........1 -- calculateOverlappingPairs (0.36 %) :: 0.018 ms / frame (2 calls)
.........2 -- updateAabbs (3.17 %) :: 0.159 ms / frame (2 calls)
.........Unaccounted: (0.160 %) :: 0.008 ms


Simulated time: 162.15 s
Total coins in the game: 162
Number of colliding pairs: 1091
CProfileManager::dumpAll() output:
----------------------------------
Profiling: Root (total running time: 102.571 ms) ---
0 -- stepSimulation (99.20 %) :: 101.754 ms / frame (1 calls)
Unaccounted: (0.797 %) :: 0.817 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 101.754 ms) ---
...0 -- synchronizeMotionStates (1.05 %) :: 1.069 ms / frame (21 calls)
...1 -- internalSingleStepSimulation (98.90 %) :: 100.639 ms / frame (20 calls)
...Unaccounted: (0.045 %) :: 0.046 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 100.639 ms)
......0 -- updateActivationState (0.12 %) :: 0.124 ms / frame (20 calls)
......1 -- updateActions (0.02 %) :: 0.020 ms / frame (20 calls)
......2 -- integrateTransforms (1.42 %) :: 1.433 ms / frame (20 calls)
......3 -- solveConstraints (40.43 %) :: 40.688 ms / frame (20 calls)
......4 -- calculateSimulationIslands (0.86 %) :: 0.862 ms / frame (20 calls)
......5 -- performDiscreteCollisionDetection (55.11 %) :: 55.465 ms / frame (20
......6 -- predictUnconstraintMotion (1.86 %) :: 1.870 ms / frame (20 calls)
......Unaccounted: (0.176 %) :: 0.177 ms
.........----------------------------------
.........Profiling: integrateTransforms (total running time: 1.433 ms) ---
.........0 -- CCD motion clamping (23.66 %) :: 0.339 ms / frame (20 calls)
.........Unaccounted: (76.343 %) :: 1.094 ms
............----------------------------------
............Profiling: CCD motion clamping (total running time: 0.339 ms) ---
............0 -- convexSweepTest (87.91 %) :: 0.298 ms / frame (20 calls)
............Unaccounted: (12.094 %) :: 0.041 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 40.688 ms) ---
.........0 -- processIslands (97.73 %) :: 39.764 ms / frame (20 calls)
.........1 -- islandUnionFindAndQuickSort (2.11 %) :: 0.860 ms / frame (20 call
.........Unaccounted: (0.157 %) :: 0.064 ms
............----------------------------------
............Profiling: processIslands (total running time: 39.764 ms) ---
............0 -- solveGroup (95.29 %) :: 37.892 ms / frame (20 calls)
............Unaccounted: (4.708 %) :: 1.872 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 37.892 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (68.67 %) :: 26.019 ms /
...............1 -- solveGroupCacheFriendlySetup (29.04 %) :: 11.003 ms / frame
...............Unaccounted: (2.296 %) :: 0.870 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 55.4
.........0 -- dispatchAllCollisionPairs (94.56 %) :: 52.447 ms / frame (20 call
.........1 -- calculateOverlappingPairs (0.53 %) :: 0.293 ms / frame (20 calls)
.........2 -- updateAabbs (4.77 %) :: 2.647 ms / frame (20 calls)
.........Unaccounted: (0.141 %) :: 0.078 ms



Simulated time: 264.17 s
Total coins in the game: 264
Number of colliding pairs: 2059
CProfileManager::dumpAll() output:
----------------------------------
Profiling: Root (total running time: 204.656 ms) ---
0 -- stepSimulation (99.42 %) :: 203.468 ms / frame (1 calls)
Unaccounted: (0.580 %) :: 1.188 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 203.468 ms) ---
...0 -- synchronizeMotionStates (0.91 %) :: 1.846 ms / frame (21 calls)
...1 -- internalSingleStepSimulation (99.07 %) :: 201.569 ms / frame (20 calls)
...Unaccounted: (0.026 %) :: 0.053 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 201.569 ms)
......0 -- updateActivationState (0.10 %) :: 0.193 ms / frame (20 calls)
......1 -- updateActions (0.01 %) :: 0.018 ms / frame (20 calls)
......2 -- integrateTransforms (1.12 %) :: 2.249 ms / frame (20 calls)
......3 -- solveConstraints (37.20 %) :: 74.992 ms / frame (20 calls)
......4 -- calculateSimulationIslands (0.85 %) :: 1.712 ms / frame (20 calls)
......5 -- performDiscreteCollisionDetection (59.13 %) :: 119.185 ms / frame (2
......6 -- predictUnconstraintMotion (1.51 %) :: 3.048 ms / frame (20 calls)
......Unaccounted: (0.085 %) :: 0.172 ms
.........----------------------------------
.........Profiling: integrateTransforms (total running time: 2.249 ms) ---
.........0 -- CCD motion clamping (19.83 %) :: 0.446 ms / frame (20 calls)
.........Unaccounted: (80.169 %) :: 1.803 ms
............----------------------------------
............Profiling: CCD motion clamping (total running time: 0.446 ms) ---
............0 -- convexSweepTest (89.91 %) :: 0.401 ms / frame (20 calls)
............Unaccounted: (10.090 %) :: 0.045 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 74.992 ms) ---
.........0 -- processIslands (97.67 %) :: 73.242 ms / frame (20 calls)
.........1 -- islandUnionFindAndQuickSort (2.23 %) :: 1.674 ms / frame (20 call
.........Unaccounted: (0.101 %) :: 0.076 ms
............----------------------------------
............Profiling: processIslands (total running time: 73.242 ms) ---
............0 -- solveGroup (94.35 %) :: 69.103 ms / frame (20 calls)
............Unaccounted: (5.651 %) :: 4.139 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 69.103 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (69.37 %) :: 47.940 ms /
...............1 -- solveGroupCacheFriendlySetup (28.39 %) :: 19.617 ms / frame
...............Unaccounted: (2.237 %) :: 1.546 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 119.
.........0 -- dispatchAllCollisionPairs (90.98 %) :: 108.439 ms / frame (20 cal
.........1 -- calculateOverlappingPairs (0.94 %) :: 1.119 ms / frame (20 calls)
.........2 -- updateAabbs (8.01 %) :: 9.545 ms / frame (20 calls)
.........Unaccounted: (0.069 %) :: 0.082 ms