PGI Compiler Bug
I ran across another PGI compiler bug that bears noting because it was so annoying to track down. Here’s the code:
static inline uint64_t qthread_cas64(
volatile uint64_t *operand,
const uint64_t newval,
const uint64_t oldval)
{
uint64_t retval;
__asm__ __volatile__ ("lock; cmpxchg %1,(%2)"
: "=&a"(retval) /* store from RAX */
: "r"(newval),
"r"(operand),
"a"(oldval) /* load into RAX */
: "cc", "memory");
return retval;
}
Now, both GCC and the Intel compiler will produce code you would expect; something like this:
mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %rax,0xfffffffffffffff8(%rbp)
In essence, that’s:
- copy the newval into register
%r12
(almost any register is fine) - copy the operand into register
%r13
(almost any register is fine) - copy the oldval into register
%rax
(as I specified with “a”) - execute the ASM I wrote (the compare-and-swap)
- copy register
%rax
to the variable I specified
Here’s what PGI produces instead:
mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %eax,0xfffffffffffffff8(%rbp)
You notice the problem? That last step became %eax
, so only the lower 32-bits of my 64-bit CAS get returned!
The workaround is to do something stupid: be more explicit. Like so:
static inline uint64_t qthread_cas64(
volatile uint64_t *operand,
const uint64_t newval,
const uint64_t oldval)
{
uint64_t retval;
__asm__ __volatile__ ("lock; cmpxchg %1,(%2)\n\t"
"mov %%rax,(%0)"
:
: "r"(&retval) /* store from RAX */
"r"(newval),
"r"(operand),
"a"(oldval) /* load into RAX */
: "cc", "memory");
return retval;
}
This is stupid because it requires an extra register; it becomes this:
mov 0xfffffffffffffff8(%rbp),%rbx
mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %rax,(%rbx)
Obviously, not a killer (since it can be worked around), but annoying nevertheless.
A similar error happens in this code:
uint64_t retval;
__asm__ __volatile__ ("lock xaddq %0, (%1)"
:"+r" (retval)
:"r" (operand)
:"memory");
It would appear that PGI completely ignores the bitwidth of output data!