libstagefright - Experimental hardware video decoding builds

  Thread Rating:
  • 10 Votes - 4.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Thread Closed
Koying Offline
Team-XBMC Member
Posts: 1,833
Joined: Sep 2008
Reputation: 36
Location: Brussels, Belgium
Post: #361
(2013-02-23 09:59)fun_ Wrote:  I thought, buffer is prepared on main memory by software(your code). then, you can prepare tweaked buffer which has MODxx width/height for RK (you can calculate it from metadata), and you can pass it as a output buffer for hardware decoder.
That would be, more or less, the "pure" OMX way.
libstagefright is one level above and does all the buffers allocation. So, if Rockchip requires specific quirks, it should be in there...
find
Herman.Chen Offline
Junior Member
Posts: 10
Joined: Feb 2013
Reputation: 4
Post: #362
(2013-02-23 05:43)Koying Wrote:  Hi Herman,

Thanks for joining the thread.

We are passing a native windows to OMXCodec::Create, so rendering is done entirely inside libstagefright and we don't have any way to correct the frame size.

Related to this, it looks like the frame size passed in the metadata after a read is not correct, I. E. Is not mod16/mod32 but the published one.

Could it be that solving the metadata bug would also solve the rendering bug?
The OMXCodec::Create has a flags argument, we mainly use two type:
kSoftwareCodecsOnly = 8,
kHardwareCodecsOnly = 16,
If kHardwareCodecsOnly is used OMX component will convert video frame from yuv to RGBA and copy to GraphicBuffer which is allocated from NativeWindow. The disavantage is that the colour conversion consumes bandwidth and the conversion and display are working serially.
If kSoftwareCodecsOnly is used OMX component will output a private VPU_FRAME structure and use SoftwareRender to direct display YUV frame on framebuffer by hwc.
Both of these two path have a limitation on NativeWindow, the NativeWindow buffer width has to be 32 aligned height has to be 16 aligned by native_window_set_buffers_geometry. Then use crop function native_window_set_crop to display correctly.

There is an error when OMX component create GraphicBuffer from NativeWindow, NativeWindow width and height are not set correctly. We fix the bug and send the libstagefright.so below. Please download and check whether can play 1080p normally.
libstagefright.so
find
fun_ Offline
Junior Member
Posts: 15
Joined: Feb 2013
Reputation: 0
Post: #363
(2013-02-23 10:44)Koying Wrote:  
(2013-02-23 09:59)fun_ Wrote:  I thought, buffer is prepared on main memory by software(your code). then, you can prepare tweaked buffer which has MODxx width/height for RK (you can calculate it from metadata), and you can pass it as a output buffer for hardware decoder.
That would be, more or less, the "pure" OMX way.
libstagefright is one level above and does all the buffers allocation. So, if Rockchip requires specific quirks, it should be in there...

I thought native window (passed as last argument of OMXCodec::Create() at https://github.com/koying/xbmc/blob/32e9...o.cpp#L886) has a buffer for output. it's in your code, not in libstagefright.

but probably I misunderstood it. sorry.
(This post was last modified: 2013-02-23 11:05 by fun_.)
find
Herman.Chen Offline
Junior Member
Posts: 10
Joined: Feb 2013
Reputation: 4
Post: #364
(2013-02-23 09:34)Koying Wrote:  We don't use a "real" surface, but a SurfaceTexture, which wraps a GL texture.
Behind the scenes, GraphicBuffers are used, which are HW buffers that the HW decoder fills with a decoded frame. On top of those, one EGLImageKHR is mapped to each native window buffer.
"SurfaceTexture::updateTexImage" just maps the latest EGLImageKHR to the GL texture.

Fact is, on the other platforms, if there is a MODxx tweak, the frame size metadata reflects this and all is well.
On rk3066, the returned frame size is not MODxx and I assume this (wrong) frame size is used to build the EGLImage -> crashes.
So, really, for this to work on rk3066, the HW buffers size should be MODxx.

Note that using SW renderer, i.e. libstagefright thumbnail mode, just crashes the mediaserver/vpu:
Code:
F/libc    (   90): Fatal signal 11 (SIGSEGV) at 0x40112000 (code=2), thread 1960 (mediaserver)
I/DEBUG   (   85): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (   85): Build fingerprint: 'rk30sdk/rk30sdk/rk30sdk:4.1.1/JRO03H/eng.root.20130116.110927:eng/test-keys'
I/DEBUG   (   85): pid: 90, tid: 1960, name: mediaserver  >>> /system/bin/mediaserver <<<
I/DEBUG   (   85): signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 40112000
I/DEBUG   (   85):     r0 40112000  r1 42a75020  r2 001fcfe0  r3 40c37000
I/DEBUG   (   85):     r4 41b89a48  r5 41b9b420  r6 00000440  r7 00000780
I/DEBUG   (   85):     r8 42678000  r9 41b900e0  sl 41b9b690  fp 41b8d428
I/DEBUG   (   85):     ip 40111000  sp 42877ca8  lr 40f600d7  pc 4013f798  cpsr 20000030
I/DEBUG   (   85):     d0  1010101010101010  d1  1010101010101010
I/DEBUG   (   85):     d2  1010101010101010  d3  1010101010101010
I/DEBUG   (   85):     d4  0000000000000000  d5  0000002800000000
I/DEBUG   (   85):     d6  4220000041300000  d7  3f8000003debc8c1
I/DEBUG   (   85):     d8  0000000000000000  d9  0000000000000000
I/DEBUG   (   85):     d10 0000000000000000  d11 0000000000000000
I/DEBUG   (   85):     d12 0000000000000000  d13 0000000000000000
I/DEBUG   (   85):     d14 0000000000000000  d15 0000000000000000
I/DEBUG   (   85):     d16 00000000000099cf  d17 7e37e43c8800759c
I/DEBUG   (   85):     d18 4000000000000000  d19 bf66c16be38d5283
I/DEBUG   (   85):     d20 3fc5555533bce6df  d21 3e66376972bea4d0
I/DEBUG   (   85):     d22 3ff0000000000000  d23 bf6376c7f8038f6c
I/DEBUG   (   85):     d24 3ff009bb63fc01c8  d25 0000000000000000
I/DEBUG   (   85):     d26 0000000000000000  d27 0000000000000000
I/DEBUG   (   85):     d28 0000000000000000  d29 0000000000000000
I/DEBUG   (   85):     d30 0000000000000000  d31 0000000000000000
I/DEBUG   (   85):     scr 60000010
I/DEBUG   (   85):
I/DEBUG   (   85): backtrace:
I/DEBUG   (   85):     #00  pc 0000c798  /system/lib/libc.so
I/DEBUG   (   85):     #01  pc 000fece4  <unknown>
I/DEBUG   (   85):
I/DEBUG   (   85): stack:
I/DEBUG   (   85):          42877c68  ec7a2d80
I/DEBUG   (   85):          42877c6c  42a74000  anon_inode:ion_share_fd
I/DEBUG   (   85):          42877c70  40c37000  /system/lib/libvpu.so
I/DEBUG   (   85):          42877c74  00000000
I/DEBUG   (   85):          42877c78  00000000
I/DEBUG   (   85):          42877c7c  40c37000  /system/lib/libvpu.so
I/DEBUG   (   85):          42877c80  00000000
I/DEBUG   (   85):          42877c84  40c2f514  /system/lib/libvpu.so (VPUMemInvalidate+200)
I/DEBUG   (   85):          42877c88  42877cdc
I/DEBUG   (   85):          42877c8c  42877c90
I/DEBUG   (   85):          42877c90  00000000
I/DEBUG   (   85):          42877c94  41b89a48  [heap]
I/DEBUG   (   85):          42877c98  41b9b420  [heap]
I/DEBUG   (   85):          42877c9c  00000440
I/DEBUG   (   85):          42877ca0  df0027ad
I/DEBUG   (   85):          42877ca4  00000000
I/DEBUG   (   85):     #00  42877ca8  42877cdc
I/DEBUG   (   85):          42877cac  42877ce8
I/DEBUG   (   85):     #01  42877cb0  00000000
I/DEBUG   (   85):          42877cb4  00000000
I/DEBUG   (   85):          42877cb8  00000000
I/DEBUG   (   85):          42877cbc  00000000
I/DEBUG   (   85):          42877cc0  41b9b6a0  [heap]
I/DEBUG   (   85):          42877cc4  00000000

emm, that is normal. Because we have modified the SoftwareRenderer to adapt to VPU_FRAME structure
This is the patch we made:

diff --git a/media/libstagefright/include/SoftwareRenderer.h b/media/libstagefright/include/SoftwareRenderer.h
index 7ab0042..994c2e1 100644
--- a/media/libstagefright/include/SoftwareRenderer.h
+++ b/media/libstagefright/include/SoftwareRenderer.h
@@ -21,7 +21,8 @@
#include <media/stagefright/ColorConverter.h>
#include <utils/RefBase.h>
#include <system/window.h>
-
+#include <utils/Vector.h>
+#include "vpu_global.h"
namespace android {

struct MetaData;
@@ -48,7 +49,13 @@ private:
int32_t mWidth, mHeight;
int32_t mCropLeft, mCropTop, mCropRight, mCropBottom;
int32_t mCropWidth, mCropHeight;
+ Vector<VPU_FRAME*> mStructId;

+ uint32_t mLastbuf;
+ bool init_Flag;
+ int32_t rga_fd;
+ int32_t power_fd;
+ int32_t mHttpFlag;
SoftwareRenderer(const SoftwareRenderer &);
SoftwareRenderer &operator=(const SoftwareRenderer &);
};
find
fun_ Offline
Junior Member
Posts: 15
Joined: Feb 2013
Reputation: 0
Post: #365
(2013-02-23 10:51)Herman.Chen Wrote:  There is an error when OMX component create GraphicBuffer from NativeWindow, NativeWindow width and height are not set correctly. We fix the bug and send the libstagefright.so below. Please download and check whether can play 1080p normally.
libstagefright.so

this libstagefright.so is for both 4.0 and 4.1?
does RK2918 have same hardware decoder and same bug?
(This post was last modified: 2013-02-23 11:10 by fun_.)
find
fun_ Offline
Junior Member
Posts: 15
Joined: Feb 2013
Reputation: 0
Post: #366
(2013-02-23 11:03)Herman.Chen Wrote:  emm, that is normal. Because we have modified the SoftwareRenderer to adapt to VPU_FRAME structure

can you provide vpu_global.h or "typedef struct tVPU_FRAME { ... } VPU_FRAME;" part under free/open source license?

I can see it in hardware/rk29/libon2/ in RK's ICS source, but license is not clear.
EDIT: ah, sorry, it also exists in your libvpu.tar. but license is not clear too...

----
btw, I'm not sure this kind of device specific hack can be merged to XBMC. koying?
(This post was last modified: 2013-02-23 12:05 by fun_.)
find
Koying Offline
Team-XBMC Member
Posts: 1,833
Joined: Sep 2008
Reputation: 36
Location: Brussels, Belgium
Post: #367
(2013-02-23 10:51)Herman.Chen Wrote:  
(2013-02-23 05:43)Koying Wrote:  Hi Herman,

Thanks for joining the thread.

We are passing a native windows to OMXCodec::Create, so rendering is done entirely inside libstagefright and we don't have any way to correct the frame size.

Related to this, it looks like the frame size passed in the metadata after a read is not correct, I. E. Is not mod16/mod32 but the published one.

Could it be that solving the metadata bug would also solve the rendering bug?
The OMXCodec::Create has a flags argument, we mainly use two type:
kSoftwareCodecsOnly = 8,
kHardwareCodecsOnly = 16,
If kHardwareCodecsOnly is used OMX component will convert video frame from yuv to RGBA and copy to GraphicBuffer which is allocated from NativeWindow. The disavantage is that the colour conversion consumes bandwidth and the conversion and display are working serially.
If kSoftwareCodecsOnly is used OMX component will output a private VPU_FRAME structure and use SoftwareRender to direct display YUV frame on framebuffer by hwc.
Both of these two path have a limitation on NativeWindow, the NativeWindow buffer width has to be 32 aligned height has to be 16 aligned by native_window_set_buffers_geometry. Then use crop function native_window_set_crop to display correctly.

There is an error when OMX component create GraphicBuffer from NativeWindow, NativeWindow width and height are not set correctly. We fix the bug and send the libstagefright.so below. Please download and check whether can play 1080p normally.
libstagefright.so
The good news is that your updated libstagefright.so prevents the vpu from crashing Smile
The bad news is that performance is very poor above 720p Sad
Doesn't the rk3066 have HW YUV -> RGB conversion?

We indeed use kHardwareCodecsOnly to indicate that we only wants HW accelerated codecs. I don't think setting this flag should influence the way the frame is rendered, though.

On other platforms, if you don't supply an ANativeWindow to OMXCodec::Create, the buffers are created inside libstagefright rather than using the ones of the surface. You then get access to the frame content via MediaBuffer::data() rather than MediaBuffer::graphicbuffer(). Those have to be in the origin color system, e.g. yuv420, which might solve the color conversion performance issue as we are using GL shaders to blit the YUV content.
Does your SoftwareRenderer patch means you're returning a opaque struct rather than the actual yuv data?
find
Koying Offline
Team-XBMC Member
Posts: 1,833
Joined: Sep 2008
Reputation: 36
Location: Brussels, Belgium
Post: #368
(2013-02-23 11:26)fun_ Wrote:  btw, I'm not sure this kind of device specific hack can be merged to XBMC. koying?
It would fall off the scope of having a common Android HW decoding support, for sure.
It could be made into a Rockchip specific "codec", as was done for AmLogic, but that would be another story...
find
fishpepper Offline
Junior Member
Posts: 1
Joined: Feb 2013
Reputation: 0
Post: #369
device: hardkernel ordoid u2:
- 1.7GHz Exynos4412 Prime Cortex-A9 Quad-core processor
with PoP (Package on Package) 2Gbyte LPDDR2 880Mega Data Rate
- Mali-400 Quad Core 440MHz

running cyanogenmod 10.1 (android 4.2)

http://www.xbmclogs.com/show.php?id=1132

issues:
stuttering playback of HD files
(for example H264 - MPGE-4 (part 10) (H264), 1280x720, 50fps, planar 4:2:0 YUV, Aufo: MPEG 1/2/3 192kbit/s)
looks very strange. could be described as playing some frames slow, than some frames
faster, then frames slower again. if there is constant motion in a scene (e.g. camera moves) it looks like the motions is alternating between slow and fast.
relly strange

regards,
simon
(This post was last modified: 2013-02-23 14:06 by fishpepper.)
find
CruNcher Offline
Member
Posts: 79
Joined: Jan 2013
Reputation: 0
Post: #370
i pushed the libstagefright.so onto this tablet no boot possible anymore (though older sdk 3 firmware from november) will try it with a december firmware.

Yep December firmware works

Finally no Crash anymore on the Zelda Medley even with both cores active , not sure about the Performance it stucks a little here and there

But at least it's stable now and decent playback is now possible with Youtube 1080p complexity, will test more Smile

The second run also looks better with less stucks in the GL render pipeline for this tablet firmware Smile

Yep 4 runs now no crash Smile

And yes also Mx Players H/W+ Mode benefits from this bugfix we can just hope that it finds it way to tablets and other RK3066 devices fast Wink
after seeking it takes some seconds to get stable again, though not with Mx Players H/W+ Mode seeking is very performant (better frame buffering) here (instant).

I pushed it hard seeking forth and back like crazy no crash, though it's clear for Performance and whenever possible use and prefer the Native HW mode (lowest overhead) Wink
Especially when trying to save battery on the go don't use XBMC or MX Player H/W+ Mode it's anyway crazy todo so in that scenario (to much overhead)

So this fixes the most annoying issues for RK3066

+ 1080p crashes
+ modxx render issues

perfect so far in terms of outside stability, performance is another issue Wink
(This post was last modified: 2013-02-23 16:02 by CruNcher.)
find
Herman.Chen Offline
Junior Member
Posts: 10
Joined: Feb 2013
Reputation: 4
Post: #371
(2013-02-23 11:09)fun_ Wrote:  
(2013-02-23 10:51)Herman.Chen Wrote:  There is an error when OMX component create GraphicBuffer from NativeWindow, NativeWindow width and height are not set correctly. We fix the bug and send the libstagefright.so below. Please download and check whether can play 1080p normally.
libstagefright.so

this libstagefright.so is for both 4.0 and 4.1?
does RK2918 have same hardware decoder and same bug?

This libstagefright.so is for 4.1, not for 4.0.
RK2918 uses different GPU, do not have the same bug. This bug is caused by GPU limit not hardware decoder.


(2013-02-23 11:26)fun_ Wrote:  can you provide vpu_global.h or "typedef struct tVPU_FRAME { ... } VPU_FRAME;" part under free/open source license?

I can see it in hardware/rk29/libon2/ in RK's ICS source, but license is not clear.
EDIT: ah, sorry, it also exists in your libvpu.tar. but license is not clear too...

----
btw, I'm not sure this kind of device specific hack can be merged to XBMC. koying?

VPU_FRAME part structure is writen by Rockchip. It has no specified license. We decide to open it, but we do not add license on it so far.
VPU_FRAME structure is a private path, and it is a hack for andrdoid framework and our hardware combination. In my opinion as a SW engineer it is not very good for a general software. I know this private path is quite ugly but it works.

(2013-02-23 12:43)Koying Wrote:  The good news is that your updated libstagefright.so prevents the vpu from crashing Smile
The bad news is that performance is very poor above 720p Sad
Doesn't the rk3066 have HW YUV -> RGB conversion?

We indeed use kHardwareCodecsOnly to indicate that we only wants HW accelerated codecs. I don't think setting this flag should influence the way the frame is rendered, though.

On other platforms, if you don't supply an ANativeWindow to OMXCodec::Create, the buffers are created inside libstagefright rather than using the ones of the surface. You then get access to the frame content via MediaBuffer::data() rather than MediaBuffer::graphicbuffer(). Those have to be in the origin color system, e.g. yuv420, which might solve the color conversion performance issue as we are using GL shaders to blit the YUV content.
Does your SoftwareRenderer patch means you're returning a opaque struct rather than the actual yuv data?

Glad to hear the lib can help to resolve your problem.
RK3066 has rga to do YUV -> RGB conversion. It has a very good performance and easy interface.
Both kHardwareCodecsOnly and kSoftwareCodecsOnly use HW to decode video, but the difference is in the output. One is standard MediaBuffer with GraphicBuffer (in 4.1), The other is in for private path to direct render to FB which can make 1080p more smooth.

MediaBuffer::data() is for 4.0 and MediaBuffer::graphicbuffer() first appear in 4.1 for more direct connect between surface and decoder.

Yes, you are right. SoftwareRenderer patch indeed skip the color conversion and direct use hwc to composite yuv data to FB. The opaque struct contains the phycial address of video frame and some other information to make it possible to do the direct blending.Finally it comes to a zero copy. The early version of Samsung code use similiar strategy.
find
fun_ Offline
Junior Member
Posts: 15
Joined: Feb 2013
Reputation: 0
Post: #372
(2013-02-23 16:12)Herman.Chen Wrote:  
(2013-02-23 11:09)fun_ Wrote:  this libstagefright.so is for both 4.0 and 4.1?
does RK2918 have same hardware decoder and same bug?

This libstagefright.so is for 4.1, not for 4.0.
RK2918 uses different GPU, do not have the same bug. This bug is caused by GPU limit not hardware decoder.

I see. thank you!

(2013-02-23 16:12)Herman.Chen Wrote:  VPU_FRAME part structure is writen by Rockchip. It has no specified license. We decide to open it, but we do not add license on it so far.

"decide to open" it's really great. it will help open source community like XBMC. people want to get Rockchip devices to use XBMC Smile

then, about license, I think it should be specified. some open source project may not accept any code if it's unclear.
I recommend Apache license as like as Android http://source.android.com/source/licenses.html
find
Koying Offline
Team-XBMC Member
Posts: 1,833
Joined: Sep 2008
Reputation: 36
Location: Brussels, Belgium
Post: #373
Let's try to recap...
(2013-02-23 16:12)Herman.Chen Wrote:  RK3066 has rga to do YUV -> RGB conversion. It has a very good performance and easy interface.
I'm a bit lost. Then why did you say color conversion YUV420 -> RGB induced reduced performance? Does that happen in CPU memory?

(2013-02-23 16:12)Herman.Chen Wrote:  Both kHardwareCodecsOnly and kSoftwareCodecsOnly use HW to decode video, but the difference is in the output. One is standard MediaBuffer with GraphicBuffer (in 4.1), The other is in for private path to direct render to FB which can make 1080p more smooth.
MediaBuffer::data() is for 4.0 and MediaBuffer::graphicbuffer() first appear in 4.1 for more direct connect between surface and decoder.
So, it is not possible to use private path and kHardwareCodecsOnly, right? Or can we still get a tVPU_FRAME via MediaBuffer::data() if we don't specify a native window?

(2013-02-23 16:12)Herman.Chen Wrote:  Yes, you are right. SoftwareRenderer patch indeed skip the color conversion and direct use hwc to composite yuv data to FB. The opaque struct contains the phycial address of video frame and some other information to make it possible to do the direct blending.Finally it comes to a zero copy. The early version of Samsung code use similiar strategy.
Would you mind publishing the diff on SoftwareRenderer.cpp, too?
find
fun_ Offline
Junior Member
Posts: 15
Joined: Feb 2013
Reputation: 0
Post: #374
(2013-02-23 16:12)Herman.Chen Wrote:  RK3066 has rga to do YUV -> RGB conversion. It has a very good performance and easy interface.

(2013-02-23 16:12)Herman.Chen Wrote:  Yes, you are right. SoftwareRenderer patch indeed skip the color conversion and direct use hwc to composite yuv data to FB. The opaque struct contains the phycial address of video frame and some other information to make it possible to do the direct blending.Finally it comes to a zero copy. The early version of Samsung code use similiar strategy.

is decoded frame always converted and composed to RGB framebuffer in zero copy path?

I think most stick/STB products with RK3066 use 1280x720 framebuffer. then, any 1080p videos are downscaled to 720p for framebuffer, and lastly upscaled to 1080p for 1080p TV?

or 1920x1080 YUV framebuffer can be overlayed on top of 1280x720 RGB(UI) framebuffer?

(this question will be off-topic, but I want to know how "normal video player apps" works on RK3066 because I want to know difference between other apps and XBMC...)
(This post was last modified: 2013-02-23 19:16 by fun_.)
find
CruNcher Offline
Member
Posts: 79
Joined: Jan 2013
Reputation: 0
Post: #375
Hmm there is a issue Chen a AVC Blu-Ray stream .m2ts shows strange frame jumping (reorder issue) in the beginning for around 35 seconds either with H/W+ or XBMC
no frame jumps with the kSoftwareCodecsOnly path (RKPlayer,MX Player Hardware Mode, Archos Video Player). Maybe it's a parser issue i guess both player use in these modes the libav parser not yours.
Though it happens only on the first load seeking back doesn't show these frame jumps anymore.


Mpeg-2 Blu-Ray stream .m2ts still fallsback to ff-mpeg2video, is the vpu mpeg-2 decoder not accessible via libstagefright as neither mx player can use it in H/W+ mode ?

Also .m2ts and .mkv VC-1 fallback to ff-vc1

so currently only H.264 seems to work accelerated for RK3066, which at least grants most web based plugin support by now Smile

tested and work now

Vimeo 1080p
Youtube 1080p
Apple Quicktime Trailer 1080p

Lot of Apple Quicktime 1080p content tough wont work sufficient yet, especially Game content Promo Videos that use very high framerates

so most H.264 based Web Platforms should work now

Mpeg-4 Part 2 ASP 1080p 23.976 stf-mpeg4 also works though frame drop issues
(This post was last modified: 2013-02-23 22:08 by CruNcher.)
find
Thread Closed