Well, after a fun weekend of tracking down various numerical errors in my luminance shaders(Doing integer math in floats is fun!) I've got my motion compensation output matching FFMPEG's for the most part. Here's a couple pics from the current source i'm working with. I haven't yet fully debugged the chroma shaders
Motion-compensated luma-image:
Reconstructed luma-image:
However all is not well in the land of GPU assisted H.264
. The h.264 spec allows Intra blocks to predict off of P/B macroblocks(I constructed the video above to not use I-blocks in P/B frames. This isn't a common option however). Unfortunately, if Motion compensation is done on the GPU, that means the P/B macroblocks are in GPU memory. For the CPU to do Intra prediction off of these blocks, it would require a readback from GPU memory - something that'll destroy performance. This means I'll have to implement intra-prediction on the GPU. This isn't an impractical solution . The Xbox 360 paper linked in the first post, for example, makes use of it. It even outlines the method they used for 4x4 I blocks, while I'm sure I can extrapolate to the other block sizes.
As a small aside, Intra prediction doesn't map very well to GPU. This is due to the sheer number of modes(9 4x4 modes, 9 8x8modes, 4 16x16 modes) as well as the high interdependency of the blocks. However, I am optimistic that it can be done in a reasonable manner. I'll update once I get some more work into these shaders.
Oh, and the deblocking filter will be more fun