SSE SIMD Vector Math Library only has 3 digits of precision

projectileman · Post by **projectileman** » Mon Oct 08, 2007 12:21 pm

Hi.
Just checking if I could use the Sony's VectorMath library in my applications with SSE configuration.
It's performance is pretty good when comparing with LinearMath from Bullet.
But checking the results I've notice that when applying vector normalization it gives several precision errors since the fourth decimal digit. That means that its Inverse Sqrt function only gives 3 digit of precision.

That's is not serious, except when you need Predictability in your simulation.
I've replaced the SSE rsqrt function with the old one sqrt, which gives better results but degrades the performance.

I only need 32bit float precision, so I could use aproximations for sqrt but always predictably.
Focusing in x86 arquitecture, What do you suggest me to do for solving this problem??

Here is my testbed application.

simdmath_testbed.zip

Erwin Coumans · Post by **Erwin Coumans** » Mon Oct 08, 2007 8:26 pm

Yeah, the SSE rsqrt is an approximation. Let's add a Newton Rapson iteration to improve the accuracy.
See http://softwarecommunity.intel.com/arti ... g/1520.htm:

Jacco Bikker wrote: The SIMD code is already quite a bit faster than the scalar code (which needs to be executed four times, to process the same amount of data), but it can still be improved considerably. The SSE instruction set has a specialized instruction for calculating the reciprocal of a square root: _mm_rsqrt_ps. This instruction uses a hardware look-up table, which limits its accuracy somewhat. This accuracy can be improved by performing a single Newton-Raphson iteration on the table-supplied approximation. Using scalar code, this could be written as:

Code: Select all

float approx = sqrtLUT[v]
float muls = v * approx * approx;
return (approx * 0.5f) * (3 – muls);

This would be translated to the following SIMD code:

Code: Select all

static __forceinline __m128 fastrsqrt( const __m128 v )
{   
const __m128 approx = _mm_rsqrt_ps( v );   
const __m128 muls = _mm_mul_ps(_mm_mul_ps(v, approx), approx);   
return _mm_mul_ps(_mm_mul_ps(_half4, approx), _mm_sub_ps(_three, muls) );
}

Jacco Bikker wrote: Where ‘_half4’ is the vector (0.5, 0.5, 0.5, 0.5) and ‘_three’ is the vector (3, 3, 3, 3). The performance of the fast reverse square root function is timed with the following code, where ‘TCOUNT’ is set to 20M and ‘QCOUNT’ to 5M. With these settings, these loops normalize 20 million vectors stored in arrays.

Note that this is only affecting the SSE version. The Cell/SPU/PPU versions are fine.
Thanks,
Erwin

projectileman · Post by **projectileman** » Mon Oct 08, 2007 9:59 pm

Hey thanks a lot Erwin. That solves my problem!!

But I have another question for anyone who knows.

How I could trucate the digits of precision in SIMD?? just for getting more predictable results.

imtrobin · Post by **imtrobin** » Thu Feb 28, 2008 5:08 am

Hmm, I did some testing too. Even vector normalize has 3 digits of precision, so if such a case, is it precise enough for use?

http://www.gamedev.net/community/forums ... _id=482212

Erwin Coumans · Post by **Erwin Coumans** » Thu Feb 28, 2008 7:11 am

You can add more precision by adding a Newton Rapson iteration.

Attached are the modified files to get more precision. Unzip them in the Extras\vectormathlibrary\include\vectormath\SSE\cpp folder. It has been committed to SVN, so will be in Bullet 2.67.

Let us know if that helps,
Erwin

imtrobin · Post by **imtrobin** » Fri Feb 29, 2008 9:06 am

Cool, thanks.

Real-Time Physics Simulation Forum

SSE SIMD Vector Math Library only has 3 digits of precision

SSE SIMD Vector Math Library only has 3 digits of precision

Re: SIMD Vector Math Library only has 3 digits of precision

Re: SSE SIMD Vector Math Library only has 3 digits of precision

Re: SSE SIMD Vector Math Library only has 3 digits of precision

Re: SSE SIMD Vector Math Library only has 3 digits of precision

Re: SSE SIMD Vector Math Library only has 3 digits of precision