sbl:
You said six multiplies per pixel. I used a significant optimization over this a while back in a DOS program I wrote. You may have already thought of a reason not to do things this way, but here goes:
the optimization comes in three parts:
a.) Re-arrange the equation: a*alpha + b*(1-alpha) becomes b + ((a - b) * alpha). This reduces the number of multiplies by half.
b.) Operate on the red and blue components at the same time. The problem with this part is that it reduces the precision of the blend to 6 bits as opposed to 8. Since the red and blue colors are only 5 bits each, this shouldn't degrade the quality very much, if at all. This also reduces the number of shifts and masks to perform.
c.) Since (I think) all the PPC processors have single-cycle multiplies, this one won't help much, but on other machines (with slower multiply instructions) it might help. Basically, you would take the 2nd step a little bit further; you shift and bitwise the green component to the top 16-bits of the number. The integer would contain the color as follows: 00000gggggg00000rrrrr000000bbbbb. The zero bits would be padding for the multiply. This step wouldn't help much on the PPC, though, since the additional shifts and masks might cancel out the gain from removing one multiply. Also, this would
Like, I said, you may have already tried this and decided it doesn't work. For that matter, I might not be taking into account some obvious limitation of this approach (I do that a lot

). Still, even if you don't use it, someone else might see it and find it useful. And if it does work, two multiplies is a lot faster than six.
Most people don't know that "A highly technical term" is actually a highly technical term used to describe something that doesn't mean anything