[Daily update - Wednesday, April 23]
I have taken the time to rewrite most of the core operations, such as BltFast, AlphaBltFast and FillRect. BltFast and AlphaBltFast are now around 20-40% faster depending on data. FillRect has been "cleaned up" to allow for some nice cache re-use..
The rewriting of the core blit operations was planned for a long time, but I always felt that rewriting the core features would risk imposing errors in the rendering pipeline. It did not suit a 1.05 or a 1.1 release. With the release of a 2.0 version, there could be no better time to implement this. Thankfully it is very easy to test graphics code like this, and GapiDraw has never had any visual anomalies in previous releases.
The improved AlphaBltFast (when using separate surfaces for image and alpha) is now also significantly faster than the new RGBA surfaces.. <sigh>.. To simplify the use of the "old" alpha way I added a feature to CreateSurface: GDSURFACE_ALPHASURFACE. If you load a PNG image to a surface and set the flag GDSURFACE_ALPHASURFACE, the PNG loader will extract the alpha information stored in the image and set color values between RGB(0,0,0) and RGB(255,255,255). In other words, there is no longer a need to use the "alpha split" macro anymore, even if you use separate surfaces for image and alpha.
Now it's time for some serious testing of all the rewritten blit operations before I post the first 2.0 beta tomorrow.
stuff: It should be enough to capture WM_ACTIVATE. You never actually have to deal with SET/KILL focus. Please test the next build tomorrow and see how it works on your device.
*UPDATE* The RGBA surfaces are actually much faster when doing AlphaBlt (stretch).

So the work done on creating them are not wasted after all!! Here are some figures:
--DESKTOP-
10 000 alpha blended sprites, AlphaBltFast (200x200)
2844 - 16bit RGBA aligned
2437 - 32bit RGBA aligned
2609 - 1.04 routine aligned
1984 - new routine aligned
2015 - new routine destination unaligned
1 000 alpha blended sprites, AlphaBlt (200x200)
797 - 16bit RGBA
688 - 32bit RGBA
1062 - separate image and alpha surfaces
--POCKETPC--
1 000 alpha blended sprites, AlphaBltFast (200x200)
4827 - 16bit RGBA aligned
5034 - 32bit RGBA aligned
5326 - 1.04 routine aligned
4030 - new routine aligned
4113 - new routine destination unaligned