[LINUX] hw decode support for ARM/OMAP..
#1
... continuing a thread from HERE...

davilla Wrote:There are a few ways to handle hw decode depending on how integrated the hw decoders are.

If the audio and video hw decoders are tied together via a common pts clock, then the best way to handle this is with a new internal player (aka what dvdplayer does). It's just too hard to try to integrate this into the existing dvdplayer model. Take a look at what boxee had to do for the intel cex41xx. Icky, icky, ifdef city.

If the video hw decoder can run standalone, then clone crystalhd, vda or vtb. Those are all hw video codecs that dvdplayer can use. One thing to note is that to fit the dvdplayer model, you must be able to drop picture frames or dvdplayer will not be able to keep sync with audio...

hw video decoder can run standalone, thankfully. We'd just use a normal sw audio decode, there should be plenty of MHz to go around for that..

When you talk of dropping frames, are you meaning *before* the decoder? The video decoder hw on omap4 is pretty strong, very high bitrate/profile 1080p h264 still has headroom to spare.. so decoding itself is not likely to be a bottleneck.

davilla Wrote:You can either pass up decoded picture frames or you can 'bypass' this by various means, iOS does a 'bypass' using a opaque corevideoref which is rendered in LinuxRendererGLES.

Another swing point is, are hw decoded pictures auto-rendered on a separate video plane or are they just decoded to argb/yuv and then GLES needs to render it. GLES shader convert of yuv to argb can be very slow depending on GPU, so you might have to use neon to do a yuv to argb. There's existing code to do that (for iOS).

There are a couple options here.. and rendering is probably the first thing to figure out. Decoding could probably be handled by some existing gst patches (https://github.com/topfs2/xbmc/commits/gstreamer) with a few enhancements, assuming those aren't too much out of date.

For rendering, does auto-render on separate layer/overlay mean I have to do my own A/V sync? Or is xbmc just going to call me with the frame (handle?) to display at the right point in time? It is possible for hw to scale and blend YUV with an ARGB gfx layer. If the renderer class is just passed OSD or whatever should appear in front of video as a separate surface and allowed to combine the two layers as it sees fit, this could be an option.

If going with GL based rendering, instead of hw video overlay, I probably prefer some way to use IMG texture streaming (rather than generic GLES shader) for best performance. Either way, ideally I'd be passing NV12 (not I420) from decoder straight to renderer, along with some cropping coordinates. Raw buffer from decoder will have "codec borders" that need to be cropped out in display.

In worst case, there should be enough memory bandwidth to make an extra pass thru memory with the GPU converting YUV->RGB into something that can be used as a texture.. the existing x11/dri2 path for rendering video in totem involves more copies (when running windowed). And I guess most aren't too concerned about running off a battery..

davilla Wrote:If you are really serious about this, please create a new thread in the development section of the xbmc forums. I'm sure others will chime in.

I've worked on a few 'embedded' flavors of xbmc. iOS is one, sigma is another and there are a few more than I can't talk about yet. Some use a hw decoder together with the dvdplayer model, others have dvdplayer replaced with a custom internal player to handle the hw playback details. If I would know more about exactly what panda can do with respect to hw video decode, then I can point you into the right direction.

btw, completely unrelated question.. is there some reason why xbmc appears to want to recompile the world every time I run 'make'?
Reply
#2
i thought a/v sync is done by dropping frames?

Why not only use the gstreamer dvdplayer overlay with a pipeline containing gst-ducati?

btw why did the gstreamer overlay did not reach the mainline?
Reply
#3
robclark Wrote:When you talk of dropping frames, are you meaning *before* the decoder? The video decoder hw on omap4 is pretty strong, very high bitrate/profile 1080p h264 still has headroom to spare.. so decoding itself is not likely to be a bottleneck.

before or after, does not make a difference as the hw decoder is assumed to be much, much faster than the frame interval. The main thing is to get a picture frame dropped rather than presented for an entire frame interval.

robclark Wrote:For rendering, does auto-render on separate layer/overlay mean I have to do my own A/V sync? Or is xbmc just going to call me with the frame (handle?) to display at the right point in time? It is possible for hw to scale and blend YUV with an ARGB gfx layer. If the renderer class is just passed OSD or whatever should appear in front of video as a separate surface and allowed to combine the two layers as it sees fit, this could be an option.

Depends, if you can drop video frames when dvdplayervideo says to drop, then dvdplayer can maintain sync. If not, then you are better off constructing a replacement for dvdplayer as it's too hacky to try to deal with how dvdplayer works.

A separate video plane means there at least a global mixer that has alpha control between layers. Since GUI is on top (we hope), then its alpha determines how much of the video is seen from the below layer.

If on same layer as GUI, then rendering is blended as directed. GUI elements on top of video.


robclark Wrote:If going with GL based rendering, instead of hw video overlay, I probably prefer some way to use IMG texture streaming (rather than generic GLES shader) for best performance. Either way, ideally I'd be passing NV12 (not I420) from decoder straight to renderer, along with some cropping coordinates. Raw buffer from decoder will have "codec borders" that need to be cropped out in display.

1st be warned, I've never seen an arm GPU that can handle converting YUV-> RGB with shaders. They all run much too slow. Maybe our shaders suck for GLES Smile Even with iOS for sw decode, we take I420 from ffmpeg, convert that to rgba with a neon based routine and render the rgba. The neon code will easily beat our GLES GPU shader.

IMG textures are fine, they become an opaque item that gets passed up, that's how iOS works with CVBufferRefs. DVDPlayer does not have a clue what they are and passes a ref to them up to renderer. Renderer knows how to deal with them.

robclark Wrote:btw, completely unrelated question.. is there some reason why xbmc appears to want to recompile the world every time I run 'make'?

Depends what you touch Smile some include headers go all over. platformdefs.h is one of those. Touch that and it's a recompile just about all.
Reply
#4
overflowed Wrote:i thought a/v sync is done by dropping frames?

Yeah, but what I meant was if I render via overlay do I have to have my own logic to figure out to drop frames or not? Ie. is it bypassing xbmc's A/V sync?

Anyways, for now I'm sticking with the normal GL based rendering. I've pull the gst player patches from topfs2's tree, and made a few tweaks (we have added a crop event to avoid a frame copy after the decoder). Main issue right now on the decoder side of things is that there is no flow control and the decoder is running circles around the renderer, so we pretty quickly run out of YUV buffer memory. I've a small hack for that, but probably it would be less of an issue if rendering was faster. Right now video render is maybe almost 15fps, but nearly all the time is spent in texture upload, so I'm trying to add support for GL_OES_EGL_image_external (I've implemented, but not tested yet, an extension in our GLES stack to create/map an eglImage from raw byte array), which should drastically help the situation. Still need to figure out what to do with m_pYUVShader, I guess I create a new subclass which uses the samplerExternalOES() stuff..

overflowed Wrote:Why not only use the gstreamer dvdplayer overlay with a pipeline containing gst-ducati?

btw why did the gstreamer overlay did not reach the mainline?
Reply
#5
davilla Wrote:before or after, does not make a difference as the hw decoder is assumed to be much, much faster than the frame interval. The main thing is to get a picture frame dropped rather than presented for an entire frame interval.



Depends, if you can drop video frames when dvdplayervideo says to drop, then dvdplayer can maintain sync. If not, then you are better off constructing a replacement for dvdplayer as it's too hacky to try to deal with how dvdplayer works.

How is dvdplayervideo telling me to drop frame? Dropping after decoder is pretty easy. Before decoder is a bit harder because I then need to skip to next IDR frame.

Basically I'm doing a hack now of usleep()'ing if I have more than 4 buffers in the queue back from the decoder which XBMC hasn't consumed yet, in order to throttle the decoder (so it isn't just running open-loop)

davilla Wrote:A separate video plane means there at least a global mixer that has alpha control between layers. Since GUI is on top (we hope), then its alpha determines how much of the video is seen from the below layer.

If on same layer as GUI, then rendering is blended as directed. GUI elements on top of video.




1st be warned, I've never seen an arm GPU that can handle converting YUV-> RGB with shaders. They all run much too slow. Maybe our shaders suck for GLES Smile Even with iOS for sw decode, we take I420 from ffmpeg, convert that to rgba with a neon based routine and render the rgba. The neon code will easily beat our GLES GPU shader.

If I can get eglimageexternal working, then I think rendering should be no problem.. not sure about the earlier IMG cores, but sgx540 and later should easily be able to keep up rendering to a 1080p sized display (with texture streaming, not sure about with generic GLSL shaders..).

davilla Wrote:IMG textures are fine, they become an opaque item that gets passed up, that's how iOS works with CVBufferRefs. DVDPlayer does not have a clue what they are and passes a ref to them up to renderer. Renderer knows how to deal with them.



Depends what you touch Smile some include headers go all over. platformdefs.h is one of those. Touch that and it's a recompile just about all.

yeah, that might have been a fluke (bad clock time / ntp time server not working?? I don't have a battery backed RTC on the panda), or I might have touched some header. It seems to be better behaved for recompiles now.
Reply
#6
robclark Wrote:How is dvdplayervideo telling me to drop frame? Dropping after decoder is pretty easy. Before decoder is a bit harder because I then need to skip to next IDR frame.

Basically I'm doing a hack now of usleep()'ing if I have more than 4 buffers in the queue back from the decoder which XBMC hasn't consumed yet, in order to throttle the decoder (so it isn't just running open-loop)

void CDVDVideoCodecVDA::SetDropState(bool bDrop)

It's part of the DVDCodec API. if bDrop is true, drop the next frame.

CDVDVideoCodecVDA is a good template. It maintains an internal picture queue and also throttles to keep from sucking down DVDPlayerVideo's demux buffer. When CDVDVideoCodecVDA is told to drop a frame, it just pops the next one off the queue.
Reply
#7
davilla Wrote:void CDVDVideoCodecVDA::SetDropState(bool bDrop)

It's part of the DVDCodec API. if bDrop is true, drop the next frame.

CDVDVideoCodecVDA is a good template. It maintains an internal picture queue and also throttles to keep from sucking down DVDPlayerVideo's demux buffer. When CDVDVideoCodecVDA is told to drop a frame, it just pops the next one off the queue.

ok, cool, thx
Reply

Logout Mark Read Team Forum Stats Members Help
[LINUX] hw decode support for ARM/OMAP..0