by Digby » Jan 9, 2002 @ 4:42am
You only cache the values for the edge that is shared. You don't walk all the edges, saving all the values in an array, then go back and walk the array again to perform the fill. The way it works is that you write to the array as you're stepping down the edge and performing the fill. The next triangle can read from that array for the shared edge, and if it shares another edge with the next triangle, it caches the values as it steps down that edge. You toggle the edge buffers between triangles similar to a flipping chain of frame buffers.<br><br>Is it faster? To tell you the truth I don't know because I haven't implemented it both ways and measured the results. I'm doing perspective correct texture mapping and that requires a division operation for every pixel along that edge. You did know that the ARM has no divide instruction, right? One thing nice about the ARM though is those load/store multiple register instructions. In one instruction you can read/write all the values saved in that array to/from registers.<br><br>Speaking of the ARM... You can use 14 of the 16 ARM registers. The other two are the stack pointer and the program counter. That might be enough registers for interpolating r, g, b but not for alpha, z, u, and v. Unless I've missed something you'll need 2 additional values per interpolated value (for stepping in x and y). You'll also need a pointer to the render target buffer, a pointer to the texture map buffer, and a register to pack r, g, and b into a 565 value (or to pack two pixels into a DWORD so you don't pay a write penalty), and throw in an extra register for performing alpha blending. How many registers is that? I've lost count.<br><br>