Team-XBMC Forum Moderator
Joined: Sep 2003
Gamester17 Wrote:This other GSoC student (for the X.org project) is this summer trying to implement GPU hardware accelerated video decoding of MPEG-2 by adding XvMC front-end support to the Gallium 3D framework, the end-result when finished should be that any hardware-specific Gallium 3D back-end device-driver that supports XvMC will be able to take advantage this as long as the MPEG-2 software video decoder features support for XvMC. You should keep up with his blog for reference:
Sounds as if Younes Manton (the GSoC student for the X.org project) is making quite good progress with his project to use OpenGL GLSL shaders in order to accelerate video decoding, checkout his blog post from the day before yesterday:
Quote:Yes I'm still decoding video using shaders
It's been a while since I've said much about my video decoding efforts, but there are two pieces of good news to share. Both are improvements to Nouveau in general, not specific to video decoding.
First, we can now load 1080p clips. Thanks to a very small addition to Gallium and a few lines of code in the Nouveau winsys, a lot of brittle code was removed from the state tracker and memory allocations for incoming data are now dynamic and only done as necessary. The basic situation is we allocate a frame-sized buffer, map it, fill it, unmap it, and use it. On the next frame we map it again, fill it again, and so on. But what if the GPU is still processing the first frame? The second time we attempt to map it the driver will have to stall and wait until the GPU is done before it can let us overwrite the contents of the buffer.
But do we have to wait? Not really, we don't need the previous contents of the buffer, we're going to overwrite the whole thing anyway, so we just need a buffer that we can map immediately. To get around this we were allocating N buffers at startup and rotating between them; filling buffer 0, then 1, and so on, which reduced the likelyhood of hitting a busy buffer. The problem with that is obvious, for high res video we need a ton of extra space, most of it not being used most of the time. Now if we try to map a busy buffer, the driver will allocate a new buffer under the covers if possible and point our buffer to it, deleting the old buffer when the GPU is done with it. If the GPU is fast enough and processes buffers before you attempt to map them again, everything is good and you'll have the minimum number of buffers at any given time. If not, you'll get new buffers as necessary, in the worst case until you run out of memory, in which case you'll get stalls when mapping. The best of both worlds.
The second bit of good news is that we've managed to figure out how to use swizzled surfaces, which gave a very large performance boost. Up to now we've been using linear surfaces everywhere, which are not very cache or prefetch friendly. Rendering to swizzled surfaces during the motion compensation stage lets my modest AthonXP 1.5 GHz + GeForce 6200 machine handle 720p with plenty of CPU to spare. 1080p still bogs the GPU down, but the reason for that is pretty clear: we still render to a linear back buffer and copy to a linear front buffer. We can't swizzle our back or front buffers, so the next step will be to figure out how to get tiled surfaces working, which are similar, but can be used for back and front buffers. Hopefully soon we can tile the X front buffer and DRI back buffers and get a good speed boost everywhere, but because of the way tiled surfaces seem to work (on NV40 at least) I suspect it will require a complete memory manager to do it neatly.
Beyond that there are still a few big optimizations that we can implement for video decoding (conditional tex fetching, optimized block copying, smarter vertex pos/texcoord generation, etc), but the big boost we got from swizzling gives me a lot of optimism that using shaders for at least part of the decoding process can be a big win. It probably won't beat dedicated hardware, but for formats not supported by hardware, or for decoding more than one stream at a time, we can probably do a lot of neat things in time.
I've also been looking at VDPAU, which seems like a nice API but will require a lot of work to support on cards that don't have dedicated hardware. More on that later maybe.
...and in related news it now also seams like the Gallium 3D
framework will be merged into Mesa 3D
mainline code sooner rather than latter:
Quote:Gallium3D To Enter Mainline Mesa Code
Posted by Michael Larabel on January 12, 2009
As we shared late last week, Mesa 7.3 is getting ready for release with the first release candidate having arrived. Mesa 7.3 will feature improved GLSL 1.20 support, support for the Graphics Execution Manager, and Direct Rendering Infrastructure 2 integration
. The stabilized version of Mesa 7.3 will then go to make Mesa 7.4.
Beyond Mesa 7.4 we have learned some details as to what's next: merging Gallium3D to Mesa's master branch. Gallium3D, the new graphics architecture developed by Tungsten Graphics, has been in development for quite a while but is nearing a point of stabilization. If all goes according to plan, Gallium3D will see the light of day in Mesa 7.5. Brian Paul announced on the Mesa3D development mailing list that the gallium-0.2 branch will be merged to master following the Mesa 7.4 branching.