CPU multithreading is working!

kingchurch
Posts: 28
Joined: Sun May 13, 2012 7:14 am

Re: CPU multithreading is working!

Post by kingchurch »

Exciting! Will the MT version work on iOS devices ?
c6burns
Posts: 149
Joined: Fri May 24, 2013 6:08 am

Re: CPU multithreading is working!

Post by c6burns »

I am 99% sure it will, since it now passes muster in msvc/gcc as well as being tested on android/windows/linux. I'm still waiting for D&B to clear us so I can sign on to Apple under my company. Once that happens I'll test mac and ios ... or someone else can take it for a whirl on those platforms in the meantime.

PS - Go go lunkhound :lol:
lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound »

The threadsafe version of Bullet relies mainly on C++11 atomics for thread synchronization. This should be widely available on all modern platforms (including iOS devices). If it doesn't work out of the box on iOS it should just be a trivial change to make the detection of atomics support more robust.

I'm not calling it "multithreaded Bullet" because that sort of implies that Bullet would be launching or managing threads in some way. It is more accurate to say that it is "threadsafe" (for certain operations -- 5 currently). All of the actual thread management is left to the client -- one example of which is the MultiThreadedDemo, which uses Intel's Threaded Building Blocks for thread management and task scheduling.

I did it this way because I didn't want to tie Bullet to any particular threading library. Many projects that might be using Bullet will already be using a task scheduler of some kind and will not want Bullet to force a different one on them.

So you'll also need to get a version of TBB for your platform to get the MultiThreadedDemo running "as is". However, it shouldn't be too difficult to convert MultiThreadedDemo to another task-scheduler/threadpool library. All of the TBB specific code is surrounded by "#if USE_TBB", and there isn't that much of it, and it all uses the same idiom -- parallel_for.

The MultiThreadedDemo uses the "standard" Bullet components -- btDbvtBroadphase, btDiscreteDynamicsWorld, and btSequentialImpulseConstraintSolver. All 5 threadsafe ops have been tested with those components. If you are using one of the alternative solvers and/or physics worlds, then some of those may not be threadsafe.

The narrowphase should work regardless, as far as I know there is only one dispatcher to choose from.

The parallel island solving won't work with the MultiBodyDynamicsWorld because of a shared array of MultiBodyConstraints. And I have no idea about the soft-body dynamics world.
The MLCP solver is apparently not threadsafe. I don't know about any of the other alternative solvers.

The other 3 areas, predictUnconstraintMotion, createPredictiveContacts, and integrateTransforms are all based on overriding methods on the discrete dynamics world. They might also work on other physics worlds that are also derived from that one, but I don't know.

If you do decide to give it a spin, please post about it in this thread. I'll be glad to help out with any issues you run into.

Much thanks to c6burns for his help getting it working on Android/Linux/GCC, and for the CMake love! :)
Flix
Posts: 456
Joined: Tue Dec 25, 2007 1:06 pm

Re: CPU multithreading is working!

Post by Flix »

I don't usually like to pop-in and interrupt an ongoing discussion: however I'd like to make a few brief questions/considerations, soon after having thanked lunkhound for having restored/rewritten/reconsidered/reforked the Bullet Multithreaded branch :) .

Here are my questions/considerations (I still haven't tested the library yet, I'm just taking a look at its code):
1) In the old/legacy Bullet Multithreaded all the callbacks (e.g. the custom material callback) were forbidden (= no more usable): does this limitation apply to your version as well?
2) As far as I can understand in your source code the threading stuff is confined to:
a) LinearMath/btThreads.h/cpp: for the basic stuff (e.g. mutex,lock,thread). This is the only part of the code that uses C++11 and defaults to MSVC++ intrinsics if not supported (there is no fallback on other systems).
b) Demos/MultiThreadedDemo/MultiThreadedDemo.h/cpp: for the advanced stuff.This is the only part of the code that uses the TBB library (for tbb::blocked_range AFAICS).
Is this correct?

I was thinking that maybe by using http://tinythreadpp.bitsnbites.eu/ (a subset of C++11 in a single cpp file with a liberal license) we could add a fallback to point a (or, with a clever use of typedefs and using directives, we could even think of switching seamlessly between C++11/boost_thread and tinythreads++ using some precompiler definition, by forcing these three libraries to use the same API).

Maybe for point b we can think about providing an alternative demo using OpenMP as a replacement of TBB (however. I'm not sure if there's something compatible to tbb::blocked_range: I'll have to check the openMP API). Probably here the best option would be not to use any additional advanced library at all, and build some infrastructure using btThreads.h/cpp to have less dependencies, but I guess it won't be as easy/efficient: so it's OK for me.

IMPORTANT: These are just some optional ideas in case you find them useful.
I'm already satisfied with your work so far :D Thank you!
c6burns
Posts: 149
Joined: Fri May 24, 2013 6:08 am

Re: CPU multithreading is working!

Post by c6burns »

Flix wrote:I don't usually like to pop-in and interrupt an ongoing discussion
I don't think you're interrupting at all. The more the merrier!

2a) yes, just the locks in btThreads use c++11 atomics
2b) yes, just the code in that demo uses TBB for threading

I think it's smart to have kept threading out of bullet itself. I am ambivalent about threading libraries, but I realize many people are not and this is a good strategy for wider adoption without a threading holy war. For now I am content with adding TBB as a dependency to my project and leveraging what lunkhound has done in that demo. Personally, I just wanted an easy win to push my simulations a bit farther with minimal effort as I am in a fairly late stage of development in my current project :D
lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound »

Flix wrote:I don't usually like to pop-in and interrupt an ongoing discussion: however I'd like to make a few brief questions/considerations, soon after having thanked lunkhound for having restored/rewritten/reconsidered/reforked the Bullet Multithreaded branch :) .

Here are my questions/considerations (I still haven't tested the library yet, I'm just taking a look at its code):
1) In the old/legacy Bullet Multithreaded all the callbacks (e.g. the custom material callback) were forbidden (= no more usable): does this limitation apply to your version as well?
No, I can't see any reason why such callbacks would cause a problem. In my own code I'm using the gContactAddedCallback without issues so far. Obviously the callback will have to be threadsafe.
Flix wrote:2) As far as I can understand in your source code the threading stuff is confined to:
a) LinearMath/btThreads.h/cpp: for the basic stuff (e.g. mutex,lock,thread). This is the only part of the code that uses C++11 and defaults to MSVC++ intrinsics if not supported (there is no fallback on other systems).
b) Demos/MultiThreadedDemo/MultiThreadedDemo.h/cpp: for the advanced stuff.This is the only part of the code that uses the TBB library (for tbb::blocked_range AFAICS).
Is this correct?
a - Yes. Pretty much all that is needed for (a) is enough atomic operations to make a basic mutex. I started out using OS provided mutexes (i.e. Windows critical sections), but they were slow compared to the lightweight mutex I ended up with. It should be quite straightforward to add more fallbacks as needed.
b - Yes, the MultiThreadedDemo is where all of the actual thread management takes place. It uses TBB to initialize/cleanup a threadpool, and it uses tbb::parallel_for to send tasks to the thread pool. The blocked_range is just a struct of ints to pass along the begin and end of the for-loop.
Flix wrote: I was thinking that maybe by using http://tinythreadpp.bitsnbites.eu/ (a subset of C++11 in a single cpp file with a liberal license) we could add a fallback to point a (or, with a clever use of typedefs and using directives, we could even think of switching seamlessly between C++11/boost_thread and tinythreads++ using some precompiler definition, by forcing these three libraries to use the same API).
I think since the requirements are so small for what btThreads actually needs (it's really just a compare-and-swap, and atomic load) that it doesn't really make sense to add any extra dependencies on external libraries for it. A few lines of asm or intrinsics for various fallback cases should cover it. The btMutex class is very similar to tinythreads::fast_mutex. Although I don't like the fact that the fast_mutex header includes other headers for the sake of inlining its member functions.
Flix wrote:Maybe for point b we can think about providing an alternative demo using OpenMP as a replacement of TBB (however. I'm not sure if there's something compatible to tbb::blocked_range: I'll have to check the openMP API). Probably here the best option would be not to use any additional advanced library at all, and build some infrastructure using btThreads.h/cpp to have less dependencies, but I guess it won't be as easy/efficient: so it's OK for me.
I was thinking of OpenMP as well (as an alternative to TBB). I think it does have parallel_for, but I haven't actually used it. It would be nice to have at least 2 choices for task schedulers to underscore the fact that this isn't tied to the architecture of any one particular task scheduler.
TBB works very well, but I don't like how bloated it is. The code for it is incomprehensible -- just layers upon layers of templates scattered across dozens and dozens of headers. But it seems to be the "standard" and is cross-platform, and doesn't require any special compiler support.
Another option here would be something like JobSwarm which is a simple task scheduler in just a few source files. It just needs a bit of work to make it more cross platform friendly.
lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound »

Alright, now you can choose between TBB and OpenMP.

TBB is the default. To switch to OpenMP, open up MultiThreadedDemo.cpp and set USE_TBB to 0 and USE_OPENMP to 1.
Then make sure your compiler options are set to OpenMP mode.

Performance between the two seems about the same on my machine.

Oh, I noticed that right now the MultiThreadedDemo is hardcoding the number of threads to 4. If you are doing performance testing, make sure to set numThreads to match your hardware!
Basroil
Posts: 463
Joined: Fri Nov 30, 2012 4:50 am

Re: CPU multithreading is working!

Post by Basroil »

lunkhound wrote: The MLCP solver is apparently not threadsafe. I don't know about any of the other alternative solvers.
I'll double check my code again (apparently thread-safe now, but still want to make sure), and then I'll post a patch on that issue.

lunkhound wrote:Oh, I noticed that right now the MultiThreadedDemo is hardcoding the number of threads to 4. If you are doing performance testing, make sure to set numThreads to match your hardware!
Yup, on window's it's actually easy enough, just need to use some derivative of:

Code: Select all

#include "process.h"
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
numThreads = sysinfo.dwNumberOfProcessors;//not recommended for multiprocessor though
c6burns
Posts: 149
Joined: Fri May 24, 2013 6:08 am

Re: CPU multithreading is working!

Post by c6burns »

Since we've already gone down the c++11 road, you could use:

Code: Select all

std::thread::hardware_concurrency()
Otherwise it's a big pain to write a cross platform method. There is one in Ogre 2.0, but I like the 1 line c++11 method :lol:
lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound »

It turns out both TBB and OpenMP have similar functions to query the number of hardware threads. For OpenMP its:

Code: Select all

omp_get_max_threads();
and for TBB its:

Code: Select all

tbb::task_scheduler_init::default_num_threads();
So I removed the hardcoding to 4 threads. My machine is a 4-core with hyperthreading, so both of those report 8 threads on mine.

What's interesting is that with TBB, I see a noticeable performance improvement using 8 threads vs 4. I didn't really expect that.. hyperthreading is actually good for something! :o

With OpenMP on the other hand, performance gets wildly inconsistent and really bad using 8 threads vs 4. There is something really strange going on there -- the profiling indicates that these performance spikes are coming ONLY from predictUnconstraintMotion. Its kinda baffling.

Oh and the demo now lets you dial the number of threads up or down using '+' and '-' keys.
Flix
Posts: 456
Joined: Tue Dec 25, 2007 1:06 pm

Re: CPU multithreading is working!

Post by Flix »

:D ! Just to say I'm very happy I can use callbacks and the OpenMP version!

Only one additional thing:
On Linux 64bit in LinearMath/btThreads.cpp, I had to change:

Code: Select all

bool btIsAligned( const void* ptr, unsigned int alignment )
{
    btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
    return ( ( (unsigned int) ptr )&( alignment - 1 ) ) == 0;
}
to:

Code: Select all

bool btIsAligned( const void* ptr, unsigned int alignment )
{
    btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
    return ( ( (size_t) ptr )&( alignment - 1 ) ) == 0;
}
otherwise the cast loses precision and the code won't compile without using the -fpermissive flag on gcc.
lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound »

Flix wrote::D ! Just to say I'm very happy I can use callbacks and the OpenMP version!

Only one additional thing:
On Linux 64bit in LinearMath/btThreads.cpp, I had to change:

Code: Select all

bool btIsAligned( const void* ptr, unsigned int alignment )
{
    btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
    return ( ( (unsigned int) ptr )&( alignment - 1 ) ) == 0;
}
to:

Code: Select all

bool btIsAligned( const void* ptr, unsigned int alignment )
{
    btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
    return ( ( (size_t) ptr )&( alignment - 1 ) ) == 0;
}
otherwise the cast loses precision and the code won't compile without using the -fpermissive flag on gcc.
Good find, thanks! I'll fix it. Glad to know its been tested on 64-bit!

[edit:] fixed.
Granyte
Posts: 77
Joined: Tue Dec 27, 2011 11:51 am

Re: CPU multithreading is working!

Post by Granyte »

EDIT: This is working great any chance this would get merged back into the main trunk
lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound »

Granyte wrote:EDIT: This is working great any chance this would get merged back into the main trunk
I'm planning to make a pull request soon. I'm just waiting to see if any bugs crop up and hopefully get a few more reports from people using it.

I'm still very interested in hearing about how it works on various platforms/OSes/compilers. As far as I've heard thus far, it has only been tested with MSVC 2013/Windows, GCC/Android, and GCC/Linux.
And so far no reports about the performance aside from what I reported.

Glad to hear its working great. Can you elaborate on that a bit? :)

[edit] I went ahead and made a pull request. I figure it may take a while, and may as well get it started.
Granyte
Posts: 77
Joined: Tue Dec 27, 2011 11:51 am

Re: CPU multithreading is working!

Post by Granyte »

Well so far it's working good but I have other issue with bullet physics that until resolved prevent me from testing more in depth
Post Reply