On the desktop machine accessing graphic memory is much slower than accessing system memory therefore flip() will seem slower ( at least that used to be the case with the graphics memory)
On the other hand PocketPC doesn't have such concept meaning that framebuffer memory is basically the same as your main memory so flip() should run exactly as fast as blit from one surface to another.
Anyway, flip() on PPC runs at 5-6 ms which basically corresponds to memcpy() for 320x240x2 bytes.
GapiDraw uses prerotated images ( memory buffers) to match device screen layout.
This saves some time on the final blit but causes performance problems for people who want to access memory buffers directly ( either stuff has to be rotated (lock) or people have to know surface layout when accessing it directly.)
Another option would be to keep internal memory buffers ( surfaces) in the default unrorated state and just do rotated blit in flip() call.
It would not be much slower since rotated flip() takes around 7 ms ( as opposed to 5-6 ms for unrotated)- not much of a difference but saves a lot of trouble ( IMHO of course)