2008-12-24, 03:53
i have read the article. it's all generalities, hand waving, "experiences" and no details (guess why).
Gamester17 Wrote:This other GSoC student (for the X.org project) is this summer trying to implement GPU hardware accelerated video decoding of MPEG-2 by adding XvMC front-end support to the Gallium 3D framework, the end-result when finished should be that any hardware-specific Gallium 3D back-end device-driver that supports XvMC will be able to take advantage this as long as the MPEG-2 software video decoder features support for XvMC. You should keep up with his blog for reference:Sounds as if Younes Manton (the GSoC student for the X.org project) is making quite good progress with his project to use OpenGL GLSL shaders in order to accelerate video decoding, checkout his blog post from the day before yesterday:
http://www.bitblit.org/gsoc/gallium3d_xvmc.shtml
Quote:Yes I'm still decoding video using shaders
It's been a while since I've said much about my video decoding efforts, but there are two pieces of good news to share. Both are improvements to Nouveau in general, not specific to video decoding.
First, we can now load 1080p clips. Thanks to a very small addition to Gallium and a few lines of code in the Nouveau winsys, a lot of brittle code was removed from the state tracker and memory allocations for incoming data are now dynamic and only done as necessary. The basic situation is we allocate a frame-sized buffer, map it, fill it, unmap it, and use it. On the next frame we map it again, fill it again, and so on. But what if the GPU is still processing the first frame? The second time we attempt to map it the driver will have to stall and wait until the GPU is done before it can let us overwrite the contents of the buffer.
But do we have to wait? Not really, we don't need the previous contents of the buffer, we're going to overwrite the whole thing anyway, so we just need a buffer that we can map immediately. To get around this we were allocating N buffers at startup and rotating between them; filling buffer 0, then 1, and so on, which reduced the likelyhood of hitting a busy buffer. The problem with that is obvious, for high res video we need a ton of extra space, most of it not being used most of the time. Now if we try to map a busy buffer, the driver will allocate a new buffer under the covers if possible and point our buffer to it, deleting the old buffer when the GPU is done with it. If the GPU is fast enough and processes buffers before you attempt to map them again, everything is good and you'll have the minimum number of buffers at any given time. If not, you'll get new buffers as necessary, in the worst case until you run out of memory, in which case you'll get stalls when mapping. The best of both worlds.
The second bit of good news is that we've managed to figure out how to use swizzled surfaces, which gave a very large performance boost. Up to now we've been using linear surfaces everywhere, which are not very cache or prefetch friendly. Rendering to swizzled surfaces during the motion compensation stage lets my modest AthonXP 1.5 GHz + GeForce 6200 machine handle 720p with plenty of CPU to spare. 1080p still bogs the GPU down, but the reason for that is pretty clear: we still render to a linear back buffer and copy to a linear front buffer. We can't swizzle our back or front buffers, so the next step will be to figure out how to get tiled surfaces working, which are similar, but can be used for back and front buffers. Hopefully soon we can tile the X front buffer and DRI back buffers and get a good speed boost everywhere, but because of the way tiled surfaces seem to work (on NV40 at least) I suspect it will require a complete memory manager to do it neatly.
http://nouveau.freedesktop.org/wiki/Surface_Layouts
Beyond that there are still a few big optimizations that we can implement for video decoding (conditional tex fetching, optimized block copying, smarter vertex pos/texcoord generation, etc), but the big boost we got from swizzling gives me a lot of optimism that using shaders for at least part of the decoding process can be a big win. It probably won't beat dedicated hardware, but for formats not supported by hardware, or for decoding more than one stream at a time, we can probably do a lot of neat things in time.
I've also been looking at VDPAU, which seems like a nice API but will require a lot of work to support on cards that don't have dedicated hardware. More on that later maybe.
Quote:Gallium3D To Enter Mainline Mesa Code
Posted by Michael Larabel on January 12, 2009
As we shared late last week, Mesa 7.3 is getting ready for release with the first release candidate having arrived. Mesa 7.3 will feature improved GLSL 1.20 support, support for the Graphics Execution Manager, and Direct Rendering Infrastructure 2 integration
. The stabilized version of Mesa 7.3 will then go to make Mesa 7.4.
Beyond Mesa 7.4 we have learned some details as to what's next: merging Gallium3D to Mesa's master branch. Gallium3D, the new graphics architecture developed by Tungsten Graphics, has been in development for quite a while but is nearing a point of stabilization. If all goes according to plan, Gallium3D will see the light of day in Mesa 7.5. Brian Paul announced on the Mesa3D development mailing list that the gallium-0.2 branch will be merged to master following the Mesa 7.4 branching.
digitalhigh Wrote:So a realisitic solution isn't that far off, eh? Perhaps a couple of months?Please understand that Younes Manton's (the GSoC student for the X.org) project has nothing to do with XBMC, nor anything directly to do with something that will help XBMC to accelerate H264 video decoding.
Gamester17 Wrote:Please understand that Younes Manton's (the GSoC student for the X.org) project has nothing to do with XBMC, nor anything directly to do with something that will help XBMC to accelerate H264 video decoding.
Younes Manton is only working on XvMC support for Gallium 3D (XvMC only supports MPEG-2 and there are only a very few drivers for Gallium 3D and non of those drivers are mature). And no, Younes Manton is not only a couple of months away from being usable to normal users anyway.
I think you misunderstood my intent with that post, I only posted it as a reference for ideas so that a skilled developers such as Rudd (or someone picking up where Rudd left of) could get a few more ideas that they might be able to use to further this development of GPU assisted H.264 decoding via OpenGL shaders.
PS! @everyone, please do not try to make this into discussion about VDPAU, such posts in this thread from non-developers will be deleted.
kasbah Wrote:@ RuddHi kasbah, Robert (Rudd) has unfortunately gone M.I.A. and has not been in touch with us at XBMC in many months.
I am interested in the work you did for this as I am planning a University project to H264 decoding using shaders. Anything you did at all would be useful. Please get in touch.
# svn checkout http://xbmc.svn.sourceforge.net/svnroot/xbmc/branches/gsoc-2008-rudd
Gamester17 Wrote:PS! The VA API (Video Acceleration API) is now most interesting for abstracting this now that FFmpeg supports it:Unfortunately, that would limit it to Linux only.
http://en.wikipedia.org/wiki/VAAPI
svn checkout http://xbmc.svn.sourceforge.net/svnroot/xbmc/branches/gsoc-2008-rudd/sources/dvdplayer/ffmpeg
gpu/h264gpu.c|201| error: ‘Picture’ has no member named ‘gpu_dpb’
Quote:The Video Decode Acceleration framework is a C programming interface providing low-level access to the H.264 decoding capabilities of compatible GPUs such as the NVIDIA GeForce 9400M, GeForce 320M or GeForce GT 330M. It is intended for use by advanced developers who specifically need hardware accelerated decode of video frames.