I found a great quote:
Locks are like tanks - powerful, slow, safe, expensive, and prone to getting you stuck.
It goes well with this picture:
Creating atomic increments (and avoiding locks) on various architectures is absurdly difficult harder than it should be. And portable? HA! You said “portable.” I’ve been looking into it, and here’s my little collection of implementations.
So, there’s a small amount of information out there about doing it on PowerPC. In GCC’s inline-assembly, on Linux (using gcc 4.1.2) it looks like this:
void atomic_incr(int * operand, int incr)
{
asm volatile (
"loop:\n\t" /* repeat until this succeeds */
"lwarx 6,0,%0\n\t" /* reserve the operand */
"add 6,6,%1\n\t" /* add incr to it */
"stwcx. 6,0,%0\n\t" /* put the sum back, and release it */
"bne- loop" /* start-over on failure */
:
: "r" (operand), "r" (incr)
: "r6"
);
}
For some idiotic reason, this is different on MacOS X:
void atomic_incr(int * operand, int incr)
{
asm volatile (
"loop:\n" /* repeat until this succeeds */
"lwarx r6,0,%0\n\t" /* reserve the operand */
"add r6,r6,%1\n\t" /* add incr to it */
"stwcx. r6,0,%0\n\t" /* put the sum back, and release it */
"bne- loop\n" /* start-over on failure */
:
: "r" (operand), "r" (incr)
: "r6"
);
}
Note the addition of r
’s in front of all of the register numbers.
But that’s just fine. It makes some sense, and above all: it works!
Actually, here’s an even better one (more optimizable, and more cross-platform):
void atomic_incr(int * operand, int incr)
{
register tmp;
asm volatile (
"loop:\n" /* repeat until this succeeds */
"lwarx %2,0,%0\n\t" /* reserve the operand */
"add %2,%2,%1\n\t" /* add incr to it */
"stwcx. %2,0,%0\n\t" /* put the sum back, and release it */
"bne- loop\n" /* start-over on failure */
:: "r" (operand), "r" (incr), "r" (tmp)
);
}
Now, show me the same thing for Intel processors. What’s that? You can’t? Of course not. Because Intel loves mucking with this stuff, so each of their processors uses a slightly different syntax to get the best speed. That, and getting it into GCC syntax is additionally difficult because some genius keeps changing the GCC assembler semantics.
Here’s a basic atomic increment that works on most x86 machines (NOT i386’s):
void atomic_add(int * operand, int incr)
{
asm volatile (
"lock xaddl %1, %0\n" // add incr to operand
: // no output
: "m" (*operand), "r" (incr)
);
}
It uses xaddl
so that I don’t have to worry about output, and can do the memory operation. The following also works, but may not be as fast (or it may be equivalent):
void atomic_add(int * operand, int incr)
{
asm volatile (
"lock xaddl %1, (%0)\n" // add incr to operand
: // no output
: "r" (operand), "r" (incr)
);
}
Here’s a version that works on 386’s. I don’t know which is “faster” on all architectures:
void atomic_add(int * operand, int incr)
{
asm volatile (
"lock addl %1, %0\n" // add incr to operand
: "=m" (*operand)
: "r" (incr), "m" (*operand)
);
}
But does ANY of that work on fancy Itanium (ia64) machines? Hell no!
You may be wondering: how does the Linux kernel implement atomic increments on ia64 machines. Like this:
#define atomic_add(i,v) \
({ \
int __ia64_aar_i = (i); \
(__builtin_constant_p(i) \
&& ( (__ia64_aar_i 1) || (__ia64_aar_i 4) \
|| (__ia64_aar_i 8) || (__ia64_aar_i 16) \
|| (__ia64_aar_i -1) || (__ia64_aar_i -4) \
|| (__ia64_aar_i -8) || (__ia64_aar_i -16) \
? ia64_fetch_and_add(__ia64_aar_i, &(v)->counter) \
: ia64_atomic_add(__ia64_aar_i, v); \
})
Yeah, pretty sweet, no? Separate functions for doing simple math versus doing other increments. Here’s a simple increment:
void atomic_oneup(int * operand)
{
uint64_t res; // this MUST be 64 bits
asm volatile (
"fetchadd4.rel %0=%1,%2"
: "=r" (res)
: "m" (*operand), "i" (1)
: "memory"
);
}
Note that it HAS to return something, because it’s a fetch instruction. But it’s also conveniently atomic. And that increment doesn’t have to be 1, obviously, but could be any of the ones listed in that ugly #define
. But what about adding arbitrary numbers? For this we must use cmpxchg
. Here’s how Linux uses it (tidied up a LOT):
static __inline int ia64_atomic_add (int i, atomic_t *v)
{
__s32 old, new;
do {
old = atomic_read(v);
new = old + i;
} while (ia64_cmpxchg(acq, v, old, new, sizeof(atomic_t)) != old);
return new;
}
How can you not love something like that? Boiling it down, that call to ia64_cmpxchg
turns into: ia64_cmpxchg4_acq((__u16*)v, new, o);
(that o
variable is a place-holder). THAT, in turn, is this:
#define ia64_cmpxchg4_acq(ptr, new, old) \
({
__u64 ia64_intri_res;
asm volatile ("mov ar.ccv=%0;;" :: "rO" (old));
asm volatile ("cmpxchg4.acq %0=[%1],%2,ar.ccv":
"=r" (ia64_intri_res) : "r"(ptr), "r"(new) : "memory");
ia64_intri_res;
})
Just so we don’t go insane here, here’s a condensed loop:
void atomic_incr(int * operand, int incr)
{
int64_t res, old, new;
do {
old = *operand; // atomic if operand is 32-bit aligned
new = old + incr;
asm volatile ("mov ar.ccv=%0;;" :: "rO" (old));
// these are separate because the compiler inserts stuff in between
asm volatile ("cmpxchg4.acq %0=[%1],%2,ar.ccv":
"=r" (res) : "r"(operand), "r"(new) : "memory");
} while (res != old); // if res==old, our computation is out of date
}
Ugly much? Just a bit.
Something else I’ve discovered, which is kinda critical. There is a big difference between
asm volatile ( /* stuff */ );
and
asm __volatile__ ( /* stuff */ );
Which is that gcc is MUCH more careful not to muck with the latter, but feels free to modify the former if it feels it would be a good idea. The only difference between
asm volatile ( /* stuff */ );
and
__asm__ volatile ( /* stuff */ );
is that the latter can be used if the former conflicts with something.)
Comments (2)
With gcc 4.1.2 you can use __sync_fetch_and_add instead of writing custom assembly, I believe.
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
Posted by mwgamera | September 24, 2009 12:36 AM
Posted on September 24, 2009 00:36
I found that 4.1.2 didn't *reliably* support those builtins, particularly on OSX. But yes, those builtins (when available) are quite nice.
Posted by Kyle | September 25, 2009 12:17 AM
Posted on September 25, 2009 00:17