More Compiler Complaints: Sparc Edition
Unlike my previous whining about compilers, this one I have no explanation for. It’s not me specifying things incorrectly, it’s just the compiler being broken.
So, here’s the goal: atomically increment a variable. On a Sparc (specifically, SparcV9), the function looks something like this:
static inline int atomic_inc(int * operand)
{
register uint32_t oldval, newval;
newval = *operand;
do {
oldval = newval;
newval++;
__asm__ __volatile__ ("cas [%1], %2, %0"
: "=&r" (newval)
: "r" (operand), "r"(oldval)
: "cc", "memory");
} while (oldval != newval);
return oldval+1;
}
Seems trivial, right? We use the CAS instruction (compare and swap). Conveniently, whenever the comparison fails, it stores the value of *operand
in the second register (i.e. %0 aka newval), so there are no extraneous memory operations in this little loop. Right? Right. Does it work? NO.
Let’s take a look at the assembly that the compiler (gcc) generates with -O2 optimization:
save %sp, -0x60, %sp
ld [%i0], %i5 /* newval = *operand; */
mov %i0, %o1 /* operand is copied into %o1 */
mov %i5, %o2 /* oldval = newval; */
cas [%o1], %o2, %o0 /* o1 = operand, o2 = newval, o0 = ? */
ret
restore %i5, 0x1, %o0
Say what? Does that have ANYTHING to do with what I told it? Nope! %o0
is never even initialized, but somehow it gets used anyway! What about the increment? Nope! It was optimized out, apparently (which, in fairness, is probably because we didn’t explicitly list it as an input). Of course, gcc is awful, you say! Use SUN’s compiler! Sorry, it produces the exact same output.
But let’s be a bit more explicit about the fact that the newval
register is an input to the assembly block:
static inline int atomic_inc(int * operand)
{
register uint32_t oldval, newval;
newval = *operand;
do {
oldval = newval;
newval++;
__asm__ __volatile__ ("cas [%1], %2, %0"
: "=&r" (newval)
: "r" (operand), "r"(oldval), "0"(newval)
: "cc", "memory");
} while (oldval != newval);
return oldval+1;
}
Now, Sun’s compiler complains: warning: parameter in inline asm statement unused: %3
. Well gosh, isn’t that useful; way to recognize the fact that "0"
declares the input to be an output! But at least, gcc leaves the add
operation in:
save %sp, -0x60, %sp
ld [%i0], %i5 /* oldval = *operand; */
mov %i0, %o1 /* operand is copied to %o1 */
add %i5, 0x1, %o0 /* newval = oldval + 1; */
mov %i5, %o2 /* oldval is copied to %o2 */
cas [%o1], %o2, %o0
ret
restore %i5, 0x1, %o0
Yay! The increment made it in there, and %o0
is now initialized to something! But what happened to the do{ }while()
loop? Sorry, that was optimized away, because gcc doesn’t recognize that newval
can change values, despite the fact that it’s listed as an output!
Sun’s compiler will at least leave the while loop in, but will often use the WRONG REGISTER for comparison (such as %i2
instead of %o0
).
But check out this minor change:
static inline int atomic_inc(int * operand)
{
register uint32_t oldval, newval;
do {
newval = *operand;
oldval = newval;
newval++;
__asm__ __volatile__ ("cas [%1], %2, %0"
: "=&r" (newval)
: "r" (operand), "r"(oldval), "0"(newval)
: "cc", "memory");
} while (oldval != newval);
return oldval+1;
}
See the difference? Rather than using the output of the cas
instruction (newval
), we’re throwing it away and re-reading *operand
no matter what. And guess what suddenly happens:
save %sp, -0x60, %sp
ld [%i0], %i5 /* oldval = *operand; */
add %i5, 0x1, %o0 /* newval = oldval + 1; */
mov %i0, %o1 /* operand is copied to %o1 */
mov %i5, %o2 /* oldval is copied to %o2 */
cas [%o1], %o2, %o0
cmp %i5, %o0 /* if (oldval != newval) */
bne,a,pt %icc, atomic_inc+0x8 /* then go back and try again */
ld [%i0], %i5
ret
restore %i5, 0x1, %o0
AHA! The while loop returns! And best of all, both GCC and Sun’s compiler suddenly, magically, (and best of all, consistently) use the correct registers for the loop comparison! It’s amazing! For some reason this change reminds the compilers that newval
is an output!
It’s completely idiotic. So, we can get it to work… but we have to be inefficient in order to do it, because otherwise (inexplicably) the compiler refuses to acknowledge that our output register can change.
In case you’re curious, the gcc version is:
sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss)
and the Sun compiler is:
cc: Sun C 5.9 SunOS_sparc 2007/05/03