by Guest » Feb 11, 2003 @ 5:41pm
Yeah, precisely. In fact I do have three kinds of "subsections" for each blit - one for when destination and source are both either aligned or not aligned ( the fastest case) and two other for either only source or destination being aligned.
The inner loop does require some orring and shifting (it can get ugly like in the example below - a inner loop for source aligned non-keyed blit but in the end it is much faster).
blitNormalSourceAligned_octcopy:
ldmia r1!,{r4-r11}
strh r4, [r0], #2
mov lr,r5, lsl #16
orr r4,lr,r4,lsr #16
mov lr,r6,lsl #16
orr r5,lr,r5,lsr #16
mov lr,r7, lsl #16
orr r6,lr,r6,lsr #16
mov lr,r8,lsl #16
orr r7,lr,r7,lsr #16
mov lr,r9, lsl #16
orr r8,lr,r8,lsr #16
mov lr,r10,lsl #16
orr r9,lr,r9,lsr #16
mov lr,r11, lsl #16
orr r10,lr,r10,lsr #16
stmia r0!,{r4-r10}
mov r11,r11,lsr #16
strh r11,[r0],#2
subs r3,r3, #1
bne blitNormalSourceAligned_octcopy
I am right now cleaning up my code ( basically translating from GAS ( gnu assembler) to EVC++ style assembly) and then I will send you the example blits working with the current GAPI interface (sometime by the end of this week.)