Plans for h.264 decode acceleration?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Gamester17 Offline
Team-XBMC Forum Moderator
Posts: 10,595
Joined: Sep 2003
Reputation: 9
Location: Sweden
Post: #11
My guess it that bitstream processing (CAVLC and CABAC entropy decoding) could probably not be done efficiently using CUDA either, same as for pixel shaders.
http://wiki.xbmc.org/?title=Hardware_Acc...o_Decoding
Quote:* CABAC entropy decoding is probably not possible to offload on GPU via pixel shader.
* NVIDIA and ATI/AMD GPUs use dedicated hardware blocks for entropy decoding.
I hope that I be proven wrong, but I think that the things that could be looked at first are:
Quote:* Motion compensation (mo comp)
* Inverse Discrete Cosine Transform (iDCT)
** Inverse Telecine 3:2 and 2:2 pull-down correction
* Inverse modified discrete cosine transform (iMDCT)
* In-loop deblocking filter
* Intra-frame prediction
* Inverse quantization (IQ)
* Variable-Length Decoding (VLD), more commonly known as slice level acceleration
* Spatial-Temporal De-Interlacing, (plus automatic interlace/progressive source detection)

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
rmie Offline
Junior Member
Posts: 1
Joined: Apr 2008
Reputation: 0
Post: #12
I spen't some time a year ago, thinking about, how to implement some GLSL acceleration using ffmpeg's DSP API. The DSP API is there to provide SSE2, MMX implementations for e. g. IDCT etc.

I came to the conclusion that it's not useful to use it, as the amount of cpu cycles used to prepare and switch the OpenGL context needed, is far more than the gain. Additionally the CPU is blocked while waiting for the GPU to finish.

From my understing, first of all one need a completely restructred, maybe multithreaded, h.264 decoder, that is able to offload huge tasks to the GPU e. g. execution of deblocking filter for a whole frame. Unfortunately my understanding of h.264 internals is limited, and my spare time didn't allow to dig into. As an exercise I've implemented an GLSL based YUV->RGB converter for ffmpeg, and got nearly for free, a video scaler as well :-).

I'd like to open an technical discussion, on how "h.264 on GPU" can be realized . Even if GSoC didn't accept the project, I think It's worth starting it :-)

BTW: some of the step's listed above are nearly "for free" when using a GPU, e. g. "Motion compensation" is basicly "texture mapping" (maybe with an post processing shader).
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,185
Joined: Nov 2003
Reputation: 82
Post: #13
we already do csc and scaling in hw, even on xbox.

problem with most of these are that they are extremly branchy. which makes them very ill suited for vectorization which is where the punch of a gpu really is at. i've seen some tests, claiming a 10x increase on mocomp (mpeg2) which consist of roughly 30% of the decoding processing needed. that still yields rather mediocre speed improvements, certainly nothing that approaches running hi resolution vids on significantly weaker hw. h.264 is even worse, in particular due to stuff like cabac, which takes extreme amounts of processing and, as its a bit based compression, is extremely branchy (if my understanding is correct, that the last part with a grain of salt). this is why vendors include specific hw to accelerate these things. with hw vendors being the paranoid d*icks they are, they won't let the FOSS community tap into the resources :/

also integrating things with ffmpeg is rather hard, as you point out, since the hooks are too low level, i.e. very small operations which makes the context overhead kill any gains

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
elan Offline
Team Plex
Posts: 276
Joined: Dec 2007
Location: Maui
Post: #14
I wonder if better usage of SSE3 and SSE4.1 might help quite a bit and be more portable?

-elan
find quote
Post Reply