
Posted:
Feb 4, 2003 @ 4:32am
by Johan
Yes, the current releases (1.04 and 1.10 beta 2) use single-pixel writes for color keyed surfaces. There are several reasons for this, including both performance across several device configurations and expandability. Let me guess that those 450 sprites are not color keyed. Static surfaces in GapiDraw 1.1 will support fast multiple pixel-writes using RLE compression as long as alpha blends are not used...

Posted:
Feb 5, 2003 @ 2:30am
by efortier
Wow... And are static surfaces present in 1.10 beta, or will they be available in 1.1 release?
--Eric

Posted:
Feb 11, 2003 @ 4:59pm
by Johan
warmi: I just remembered why I never had the time to implement 32-bit blt.. Basically one has to check for odd word alignment of the top-left pixel of both the source and destination surface. If both are odd or both are even, then it's easy to copy in 32-bit chunks (after taking care of the odd start and possible odd width). If only one is odd then a special routine has to be created that reads and writes in 32-bit blocks that overlap each other.. Not really that big a deal, but it would at least take a full day of work... If performance increase is indeed almost 2x then I guess it could be worth the effort...
Fast Blit

Posted:
Feb 11, 2003 @ 5:41pm
by Guest
Yeah, precisely. In fact I do have three kinds of "subsections" for each blit - one for when destination and source are both either aligned or not aligned ( the fastest case) and two other for either only source or destination being aligned.
The inner loop does require some orring and shifting (it can get ugly like in the example below - a inner loop for source aligned non-keyed blit but in the end it is much faster).
blitNormalSourceAligned_octcopy:
ldmia r1!,{r4-r11}
strh r4, [r0], #2
mov lr,r5, lsl #16
orr r4,lr,r4,lsr #16
mov lr,r6,lsl #16
orr r5,lr,r5,lsr #16
mov lr,r7, lsl #16
orr r6,lr,r6,lsr #16
mov lr,r8,lsl #16
orr r7,lr,r7,lsr #16
mov lr,r9, lsl #16
orr r8,lr,r8,lsr #16
mov lr,r10,lsl #16
orr r9,lr,r9,lsr #16
mov lr,r11, lsl #16
orr r10,lr,r10,lsr #16
stmia r0!,{r4-r10}
mov r11,r11,lsr #16
strh r11,[r0],#2
subs r3,r3, #1
bne blitNormalSourceAligned_octcopy
I am right now cleaning up my code ( basically translating from GAS ( gnu assembler) to EVC++ style assembly) and then I will send you the example blits working with the current GAPI interface (sometime by the end of this week.)
Ok - mystery has been solved

Posted:
Feb 18, 2003 @ 8:13am
by warmi
I think I solved the problem with slow GapiDraw color-keyed blits.
The good news is that there is no problem at all :-)
I based my original observation on the MFC demo included in the GapiDraw distribution which, to me at least, looks like it was intended to be a benchmark for how many color-keyed sprites can be displayed on the screen while keeping the frame rate at 30 fps.
On my ipaq 3835 device it maxed out at around 130 sprites which seemed like a low number to me.
After investigating it further I finally figured out where the problem lies.
It has nothing to do with the blitting code but rather with the position/acceleration calculation/update loop ( can you believe that ?? )
Using CList for keeping coordinates and other sprite specific variables ( I am not all that familiar with CList and all this stuff - I haven't touched windows programming since 1997 or so) is extremely slow and actually causes the entire frame update process to fall below 30 fps at around 130 sprites.
When I replaced it with a bunch of simple arrays the number of sprites I was able to display on the screen went to 230.
One hell of a CList if you ask me :-)
Anyway, going back to my ASM code - after incorporating it into the MFC example and after "freeing" GapiDraw from the evils of CList, I noticed that the performance gap, which originally was like 200-300 %, shrunk to mere 20-30 % which, generally is pretty much what one can expect when trying to optimize with ASM already optimized C code.
The example timings for some of the bliting routines are ( running on ipaq 3835 .)
GapiDraw's BltFast with GDBLTFAST_KEYSRC: max 230 sprites at 30 fps.
Equivalent ASM blit: max 290 sprites at 30 fps.
GapiDraw's FillRec with 50% (fast) alpha: max 290 32x32 rectangles at 30 fps.
My ASM based RectFill with 50% alpha:maxes out at 420 32x32 rectangles at 30 fps.
etc ..etc
Basically, the GapiDraw code is pretty much as optimized as it can be using C (never say never though :-) ) and further
increase in speed can only be obtained using other tricks like RLE and perhaps span based z-buffer ( basically avoiding drawing whenever it makes sense.)
Originally, I wrote my asm code to create GapiDraw like framework for the Zaurus ( which is a Japanese/Sharp produced PDA - very similar to ipaqs.) but since both , PPC and the ZAurus use ARM cpu - this code runs just as well on Pocket PCs.
If anyone is interested in bunch of asm bliting code then it can be downloaded at
It contains modified MFC example from GapiDraw distro , tailored to run with my ASM code ( read ReadMe.txt before you attempt to do anything with this code.)
Walter Rawdanik