This will manifest itself as missing pixels in textures, vertices running off to weird places, or other weirdness. The GU reads data out of the PSP’s system ram. If something was written to memory recently it may still be sitting in the CPU’s cache, so the GU will be unable to read it and will instead retrieve out-of-date data. Fix this by calling sceKernelDcacheWritebackAll() between the time you write to memory and the time the GU will try to retrieve that memory. In practice this will be before sceGuCopyImage(...) or sceGuTexImage(...). If it becomes important to flush as little as possible, consider using sceKernelDcacheWritebackRange(...) instead.
For more information on the cache see http://www.goop.org/psp/cache-howto.html.
Internally, the GE processes textures as 16 bytes by 8 rows blocks (independent of actual pixelformat, so a 32×32 32-bit texture is a 128×32 texture from the swizzlings point of view). When you are not swizzling, this means it will have to do scattered reads from the texture as it moves the block into its texture-cache, which has a big impact on performance. To improve on this, you can re-order your textures into these blocks so that it can fetch one entire block by reading sequentially.
000102030405060708090A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V 101112131415161718191A1B1C1D1E1F1G1H1I1J1K1L1M1N1O1P1Q1R1S1T1U1V 202122232425262728292A2B2C2D2E2F2G2H2I2J2K2L2M2N2O2P2Q2R2S2T2U2V 303132333435363738393A3B3C3D3E3F3G3H3I3J3K3L3M3N3O3P3Q3R3S3T3U3V 404142434445464748494A4B4C4D4E4F4G4H4I4J4K4L4M4N4O4P4Q4R4S4T4U4V 505152535455565758595A5B5C5D5E5F5G5H5I5J5K5L5M5N5O5P5Q5R5S5T5U5V 606162636465666768696A6B6C6D6E6F6G6H6I6J6K6L6M6N6O6P6Q6R6S6T6U6V 707172737475767778797A7B7C7D7E7F7G7H7I7J7K7L7M7N7O7P7Q7R7S7T7U7V
The block above is a 32 bytes by 8 lines texture block (so it could be a 8×8 32-bit block, or a 16×8 16-bit block). Each pixel is represented here by a vertical index (first value) of 0-7. The second index is the horizontal index, ranging from 0-U. When reorganizing this for swizzling, we will order the data so that when the GE needs to read something in the first 16×8 block, if can just fetch that entire block, instead of offsetting into the texture for each line it has to read. The resulting swizzled portion looks like this:
000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F 202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F 404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F 606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F 0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V1G1H1I1J1K1L1M1N1O1P1Q1R1S1T1U1V 2G2H2I2J2K2L2M2N2O2P2Q2R2S2T2U2V3G3H3I3J3K3L3M3N3O3P3Q3R3S3T3U3V 4G4H4I4J4K4L4M4N4O4P4Q4R4S4T4U4V5G5H5I5J5K5L5M5N5O5P5Q5R5S5T5U5V 6G6H6I6J6K6L6M6N6O6P6Q6R6S6T6U6V7G7H7I7J7K7L7M7N7O7P7Q7R7S7T7U7V
Notice how the rectangular 16×8 blocks have ended up as sequential data, ready for direct reading by the GE.
Example code to re-order a texture into swizzled format:
void swizzle(u8* out, const u8* in, unsigned int width, unsigned int height) { unsigned int i,j; unsigned int rowblocks = (width / 16); for (j = 0; j < height; ++j) { for (i = 0; i < width; ++i) { unsigned int blockx = i / 16; unsigned int blocky = j / 8; unsigned int x = (i - blockx*16); unsigned int y = (j - blocky*8); unsigned int block_index = blockx + ((blocky) * rowblocks); unsigned int block_address = block_index * 16 * 8; out[block_address + x + y * 16] = in[i+j*width]; } } }
Or, as an alternative, here’s an optimized version that doesn’t do any heavy math in the innerloop:
void swizzle_fast(u8* out, const u8* in, unsigned int width, unsigned int height) { unsigned int blockx, blocky; unsigned int i,j; unsigned int width_blocks = (width / 16); unsigned int height_blocks = (height / 8); unsigned int src_pitch = (width-16)/4; unsigned int src_row = width * 8; const u8* ysrc = in; u32* dst = (u32*)out; for (blocky = 0; blocky < height_blocks; ++blocky) { const u8* xsrc = ysrc; for (blockx = 0; blockx < width_blocks; ++blockx) { const u32* src = (u32*)xsrc; for (j = 0; j < 8; ++j) { *(dst++) = *(src++); *(dst++) = *(src++); *(dst++) = *(src++); *(dst++) = *(src++); src += src_pitch; } xsrc += 16; } ysrc += src_row; } }
To use a swizzled texture pass GU_TRUE into the swizzled argument of sceGuTexMode:
sceGuTexMode(GU_PSM_8888,0,0,GU_TRUE); sceGuTexImage(0,widge,height,width,p_swizzled_data);
The swizzle function is fairly simple. If you look at the offset into texture like this (bit 0 on the right):
31 v lg2(width) 0 by...by by by by by my my my bx...bx mx mx mx mx
bx,by are block coords of the 16×8 block within the texture. bx has log2(width)-4 bits, and by has 31-log2(width)-3 bits (ie, all the MSBs). mx,my are the coords within the block.
The swizzle function rotates the my-bx group left by 3 bits, giving:
by...by by by by by bx...bx my my my mx mx mx mx
leaving by and mx unchanged in the offset.
Unswizzling is identical, except you rotate the my-bx group right by 3 bits.
This gives the following pair of functions:
unsigned swizzle(unsigned offset, unsigned log2_w) { if (log2_w <= 4) return offset; unsigned w_mask = (1 << log2_w) - 1; unsigned mx = offset & 0xf; unsigned by = offset & (~7 << log2_w); unsigned bx = offset & w_mask & ~0xf; unsigned my = offset & (7 << log2_w); return by | (bx << 3) | (my >> (log2_w - 4)) | mx; } unsigned unswizzle(unsigned offset, unsigned log2_w) { if (log2_w <= 4) return offset; unsigned w_mask = (1 << log2_w) - 1; unsigned mx = offset & 0xf; unsigned by = offset & (~7 << log2_w); unsigned bx = offset & ((w_mask & 0xf) << 7); unsigned my = offset & 0x70; return by | (bx >> 3) | (my << (log2_w - 4)) | mx; }