[PlanetCCRMA] Pentium-4 and denormal numbers on planetccrma

Steve Harris S.W.Harris@ecs.soton.ac.uk
Wed Jan 12 03:41:01 2005


On Mon, Jan 10, 2005 at 11:07:55 +0100, andersvi@extern.uio.no wrote:
> I see in the list-archives there has been issues around denormal
> numbers, but it seems limitied to the ladspa tap-plugin.

They effect most things that do serious DSP work.
 
> Recently found that planetccrma's dist of pd & externals gave
> some trouble related to "denormal numbers" on a pentium-4 based
> laptop here.
> 
> Problem was it went up to 90+% cpu, and stayed there, while same
> patch on athlon-based machine (half the speed) ran at 20%.
> 
> It seems that these numbers - very small numbers - are optimized
> to zero automatically on some cpu's, while others, like my
> P4-based here, need special handling when compiling apps.

Its not that exactly, athlons can process denormal numbers much more
efficiently that P4's, but they dont zero them implicitly, that would
break the IEEE FP spec.
 
> The fix was to recompile pd and the percolate-lib with
> 
>   -mcpu=pentium4 -mfpmath=sse -msse
> 
> included in CFLAGS (not sure whether "-mcpu=pentium4" actually
> was necessary).
> 
> Ive started recompiling various externals and other things with
> the flags included.  Im not sure, but wouldnt this be a problem
> with all apps running loops calculating numbers around zero
> somewhere.  Would it be a solution to compile coming upgrades at
> planetccrma with these same flags?

You dont need -mcpu=pentium4. It will help P4's, and Athlon
XP's and PIII's to a lesser extent, but it will not work on pre-XP Athlons
or Pentium II's and older, or chips by most other manufacturers.

NB, using the SSE instruction set uses a more efficient denormal handler
(about 40x slower than processing a normal number), but it still doesnt
zero them, if you also call this function I hacked up when the program
starts:

#ifdef __SSE__
#include <xmmintrin.h>
#endif

void set_denormal_flags()
{
    unsigned long a, b, c, d;

#ifdef __SSE__

    asm("cpuid": "=a" (a), "=b" (b), "=c" (c), "=d" (d) : "a" (1));
    if (d & 1<<25) { /* It has SSE support */
        _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

        asm("cpuid": "=a" (a), "=b" (b), "=c" (c), "=d" (d) : "a" (0));
        if (b == 0x756e6547) { /* It's an Intel */
            int stepping, model, family, extfamily;

            family = (a >> 8) & 0xf;
            extfamily = (a >> 20) & 0xff;
            model = (a >> 4) & 0xf;
            stepping = a & 0xf;
            if (family == 15 && extfamily == 0 && model == 0 && stepping < 7) {
                return;
            }
        }
        asm("cpuid": "=a" (a), "=b" (b), "=c" (c), "=d" (d) : "a" (1));
        if (d & 1<<26) { /* bit 26, SSE2 support */
            _mm_setcsr(_mm_getcsr() | 0x40);
        }
    } else {
        fprintf(stderr, "This code has been built with SSE support, but your processor does not support\nthe SSE instruction set.\nexiting\n");
        exit(1);
    }
#endif
}

The FPU will zero any denormals when it encounters them.

- Steve