Gamester17
Team-XBMC Forum Moderator
Posts: 10,595
Joined: Sep 2003
Reputation: 9
Location: Sweden
|
My guess it that bitstream processing (CAVLC and CABAC entropy decoding) could probably not be done efficiently using CUDA either, same as for pixel shaders.
http://wiki.xbmc.org/?title=Hardware_Acc...o_DecodingQuote:* CABAC entropy decoding is probably not possible to offload on GPU via pixel shader.
* NVIDIA and ATI/AMD GPUs use dedicated hardware blocks for entropy decoding.
I hope that I be proven wrong, but I think that the things that could be looked at first are: Quote:* Motion compensation (mo comp)
* Inverse Discrete Cosine Transform (iDCT)
** Inverse Telecine 3:2 and 2:2 pull-down correction
* Inverse modified discrete cosine transform (iMDCT)
* In-loop deblocking filter
* Intra-frame prediction
* Inverse quantization (IQ)
* Variable-Length Decoding (VLD), more commonly known as slice level acceleration
* Spatial-Temporal De-Interlacing, (plus automatic interlace/progressive source detection)
|
|
find
quote
|
rmie
Junior Member
Posts: 1
Joined: Apr 2008
Reputation: 0
|
I spen't some time a year ago, thinking about, how to implement some GLSL acceleration using ffmpeg's DSP API. The DSP API is there to provide SSE2, MMX implementations for e. g. IDCT etc.
I came to the conclusion that it's not useful to use it, as the amount of cpu cycles used to prepare and switch the OpenGL context needed, is far more than the gain. Additionally the CPU is blocked while waiting for the GPU to finish.
From my understing, first of all one need a completely restructred, maybe multithreaded, h.264 decoder, that is able to offload huge tasks to the GPU e. g. execution of deblocking filter for a whole frame. Unfortunately my understanding of h.264 internals is limited, and my spare time didn't allow to dig into. As an exercise I've implemented an GLSL based YUV->RGB converter for ffmpeg, and got nearly for free, a video scaler as well :-).
I'd like to open an technical discussion, on how "h.264 on GPU" can be realized . Even if GSoC didn't accept the project, I think It's worth starting it :-)
BTW: some of the step's listed above are nearly "for free" when using a GPU, e. g. "Motion compensation" is basicly "texture mapping" (maybe with an post processing shader).
|
|
find
quote
|
spiff
Grumpy Bastard Developer
Posts: 12,185
Joined: Nov 2003
Reputation: 82
|
we already do csc and scaling in hw, even on xbox.
problem with most of these are that they are extremly branchy. which makes them very ill suited for vectorization which is where the punch of a gpu really is at. i've seen some tests, claiming a 10x increase on mocomp (mpeg2) which consist of roughly 30% of the decoding processing needed. that still yields rather mediocre speed improvements, certainly nothing that approaches running hi resolution vids on significantly weaker hw. h.264 is even worse, in particular due to stuff like cabac, which takes extreme amounts of processing and, as its a bit based compression, is extremely branchy (if my understanding is correct, that the last part with a grain of salt). this is why vendors include specific hw to accelerate these things. with hw vendors being the paranoid d*icks they are, they won't let the FOSS community tap into the resources :/
also integrating things with ffmpeg is rather hard, as you point out, since the hooks are too low level, i.e. very small operations which makes the context overhead kill any gains
|
|
find
quote
|
elan
Team Plex
Posts: 276
Joined: Dec 2007
Location: Maui
|
I wonder if better usage of SSE3 and SSE4.1 might help quite a bit and be more portable?
-elan
|
|
find
quote
|