GPU assisted video decoding in XBMC, like motion compensation, idct, and deblocking?
#31
Sad 
so best would probably be to code/write a dedicated and extremely xbox optimized non-mplayer based core for mpeg-2/ts playback?
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#32
hello.

i've a little question, just curious...
the xbox have a specific hardware : pentium iii with
a specific g-force chipset and a specific dsp/sound
processor, like all game console, wich is the great difference
between pc : hardware is know and specific.

do you, in xbmc, use this information to do hardware
specific optimisation in assembler code ?
it's a little hard to realy do this kind of optimisation when you
make a pc program, beacause you'll never know what will be
the hardware, so you're obliged to used "standard" interface
like directx, wich will acces to the drivers.
this is of course very usefull, but add more and more software
layers.
in the xbox, you don't have this problem, the hardware is
know and specific, which mean you can, in theory, make
a very optimised code, accessing directly to the hardware,
and using all the possibility you have with it (sse/mmx
instruction for the piii for example, used of the dsp/
processor sound wich is know, directly acces/control the
video chipset) ?

so, my question is, do you do this kind of things ?

when you import/used video/audio codec, for those wich
sources are provided, do you make specific piii optimisation
of these ? which may, maybe, improve the display capacity
of xbmc. i mean, having the hardware know, in theory, you
may be able to play with xbmc, video wich may be a little
hard to play in a standard pc with a piii 700mhz. you can't
also of course make miracles, and being able to play a video
a pc with a piv 3ghz will still just be enoug to play !!!

if you don't do this kind of optimization, is there a reason
why ?
Reply
#33
xbmc's mplayer core is aleady optimzed for see/mmx (don't know if all codecs are though?), currently xbmc does not use the nvidia gpu to assist with video decoding, see discussion here, (but i guess a few visualizations use the gpu?), xbox dsp is currently only used for ac3 en/de-coding where possible, however the dsp does have much more potential (and every little helps), see dsp development discussion here
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#34
directx on xbox is not the same as directx on pc - the xbox directx is a much thinner layer over the hardware. we have some hardware specific stuff in xbmc, such as the texture formats used and so on. also the yv12->rgb conversion and movie scaling is entirely done in hardware.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#35
it can play 720p files but it will drop frames.
xbox can not play 1080i beacause its takes a p4 to play them!
but if some one could code mplayer to work with the gpu it might be able to do this..

dvico fusionhdtv 3 gold card does this.. look at the system requirements
pentium 3 750mhz or celeron 900mhz cpu with 128mb memory (with ati radeon series with dxva vga, nvidia mx440, fx series)
pentium 4 1.6ghz with 128mb ddr266 or faster memory for non dxva vga

http://www.htpcnews.com/main.php?id=fusion3_1

it uses the gpu to render the frames and takes aload off the cpu.
any one think this is possible?

here's the pr bull about dxva:


the combination of the compression and quality innovations in windows media "corona" and new features of directx, which now enabls video processing in hardware, alleviates the burden video places on the pc’s central processing unit (cpu). this results in video playback that will be possible at hdtv resolutions as high as 1,080p, six times the resolution (number of pixels) of today’s dvd-quality playback from a dvd player (480p) and the highest resolution full motion video playback ever attained on a pc.


"the pc is entering a new realm as an entertainment device," said steve kleynhans, vice president at meta group inc. "this development could serve as the catalyst that elevates the pc to a mainstream role in providing a high-quality home-theater experience."

"windows media ‘corona’ is validation that microsoft shares our goal of advancing the pc as an entertainment device," said dan vivoli, vice president of marketing at nvidia. "‘corona’ is designed to harness the performance power of our graphics processor units (gpus), resulting in a seamless home theater-quality experience on any desktop or notebook pc."

ati and nvidia, some of the leading innovators of graphic chips and video cards, will support advanced hardware acceleration for de-interlacing and playback of hdtv-quality video in windows media "corona." this support is enabled with the advancement of the directx video acceleration (dxva) interfaces in the microsoft windows operating system. dxva interfaces allow video processing, including windows media video decoding and de-interlacing, to occur on graphics hardware, freeing the pc’s cpu for other tasks and enabling lower-power pc processors to render much higher-quality video than was ever thought possible.

support for advanced video acceleration technologies in dxva together with windows media "corona" will be offered in the following ways:

ati’s video immersion technologies, an essential element within the family of ati’s radeon™ graphics chips, together with embedding the decoding of windows media "corona" video, will deliver the best video quality on desktop and notebook pcs thanks to enhanced adaptive de-interlacing and temporal filtering enabled by ati’s industry-leading video drivers. nvidia is planning in the next year to include embedded support for dxva and windows media "corona" video decoding as key features of future versions of its graphics processor chips and video cards. directx video acceleration

developed in conjunction with industry partners and released in windows xp, directx video acceleration provides a common interface for hardware and software developers to use for the acceleration of video processing routines. dxva has gained widespread acceptance as the standard interface for accelerating mpeg-2 playback for dvds in windows. now video de-interlacing with dxva brings highly advanced hardware line doubling and scaling to windows, rivaling the best picture quality possible on any consumer device at any price. dxva de-interlacing with ati radeon 8500 and nvidia’s geforce4 graphics hardware was on display last week at the windows hardware engineering conference (winhec) 2002 and is scheduled to be available to consumers this fall.

the addition of windows media "corona" video to the list of codecs supported by directx video acceleration enables hdtv-quality playback at a fraction of the cpu requirement. common windows media video operations are off-loaded from the main cpu to the graphics hardware to effectively double the video processing power available on the pc. support for acceleration of windows media video will be available with the final release of windows media "corona."
Reply
#36
little more about the nv2a gpu in the xbox

the x-igp forms the north bridge of the x-box and is essentially an nforce 420 core with an enhanced gpu core. the gpu runs at a clock speed of 233mhz and features 4 pipelines with 2 texture blocks per pipeline, giving maximum fillrates of 1.86gtexels/s or 933mpixel/s. the graphics pipeline also features two programmable vertex shaders (compared to one in the geforce3), which should provide a significant performance improvement in games which make use of vertex shaders - e.g. by implementing dot3 bumpmapping. the second vertex shader means that the geometry throughput of the x-igp is stated as 116.5mtiangles/s! like the geforce3/nv20 core the nv2a features programmable pixel and vertex shaders along with occlusion detection, which removes overdraw and increases the effective fill rate to up to 4x the actual fill rate. in a move away from fixed-function hardware, which restricts developers creativity, the gpu's vertex and pixel shaders are fully programmable. the vertex shader is a programmable simd engine which, in addition to transformation, clipping and lighting, can perform user definable tasks which may include vertex blending, morphing, skinning, reflection maps etc. up to 128 instructions can be executed on 192 quadwords of data. the pixel shader is a programmable pixel processor which allows up to 9 instructions to be executed on each pixel.

although the available memory bandwidth of 6.4gb/s shared between cpu and graphics chip (the cpu can take a maximum of 1.06gb/s of this bandwidth over its 133mhz fsb, leaving the gpu with 5.34gb/s) could be seen as a bottleneck, the lower resolution of x-box games along with the bandwidth-saving features of the gpu will mean that this shouldn't be a major problem. x-box supports tv-out at 480i, 480p, 720p and 1080i (where i = interlace, p = progressive scan and the number is the vertical resolution). the majority of games will be targeted at the non-hdtv resolution of 640x480.
Reply
#37
given xbmc uses cpu & ram already with the skins etc would it be possible to have a dedicated basic gui-less hd app (with an option from the xbmc menu saying something like 'play hd') which when executed frees up the available resources to dedicate as much as possible to playback only? i'm sure if needed people would put up with text only screens to be able to play back hd content.
Reply
#38
Smile 
from: http://www.tomshardware.com/hardnews/200...35943.html
Quote:audio supercomputer hidden in your graphics card?

by wolfgang gruener, senior editor

september 2, 2004 - 13:59 est

cambridge (ma) - nvidia's graphic cards may have much more to offer than simply drawing pixels on the screen: a startup company has found a way to translate audio signals into graphics, run them through the graphics card and overcome a common issue of limited audio effect processing performance in computers.

it is not unusual that professional music artists run into performance barriers even with the most powerful computers today. multi-track recording still is a challenging and sometimes frustrating task. james cann from bionicfx in massachusetts however noticed that audio processing task does not have to happen just in the cpu. his audio video exchange technology (avex) converts digital audio in graphics data and then performs effect calculations using the 3d architecture of nvidia gpus. compared to the capability of just six gflops of a typical cpu, nvidia's chips can reach more than 40 gflops, according to cann.


"this technology allows music hobbyists and professional artists to run studio quality audio effects at high sample rates on their desktop computer," he said. cann's invention is purely software-based and is not capable substituting a sound chip. the approach exploits the video card 3d chip, which usually is idle when users are working with multi-track recording software. "it's a great resource to use as a coprocessor," cann said. "avex is designed to reduce the cpu load by moving the processing to the video card for certain types of audio effects when making music." cann said that the technology is purely targeted at music enthusiasts and at this time brings no advantages for applications such as gaming.

but if cann is right, audio effect processing might be just a starting point how a gpu could be used for other applications. he believes that several other software types could be greatly enhanced in the same way, such as genomics or seti. "the gpu has some numeric precision issues that need to be worked out for scientific applications to be possible, but the thought of performing the computations on a resource theoretically capable of 50 and more gflops of the gpu instead of five gflops of the cpu is exciting," he said.

so far cann cannot take as much performance away from the gpu as he would like. "right now, getting the data back from the video card is very slow, so the overall performance isn't even close to the theoretical max of the card. i am hoping that the pci express architecture will resolve this. this will mean more instances of effects running at higher sample rates," he said.

still, there is significant boost of performance and reduce the load for cpu for people who are using applications such as cubase, ableton live, and other vst compatible hosts. cann's first commercial application will be bionicreverb, which is expected to go into public and free beta in october. the final version is scheduled to be released at the winter namm conference in january 2005.

bionicreverb is an impulse response reverberation effect that runs as a plug-in inside vst compatible multi-track recording software. the audio effect is generated by combining an impulse response file with digital audio. impulse response files are created by firing a starter pistol inside a location, such as carnegie hall, and recording the echoing sound waves. combining the two files through mathematical convolution is a cpu intensive process that is reduced by moving expensive calculations onto the gpu. amateur and professional guitarists, singers, pianists, and other musicians will be able to create performances in their home or studio that sound exactly like they were recorded in famous locations around the world, according to cann.

at this time, cann plans to only support nvidia graphics cards. "when i started, ati had a problem with floating point data. i have heard they have resolved it, but i won't have time to purchase and research their newest cards until after this is released," he said.

pricing was not announced yet, but cann says he will make his technology available for "far less" than the cost of professional studio dsp solutions which can run into the high five-figure range. he estimates the price will be somewhere between $200-$800.
too bad they didn't decide to make it open source huh? :tear: well, maybe someone will find a way to reverse-engineer or replicate it Wink
update!; the story above from tom's hardware has now been 'slashdottet' (link) and a lot of people replied with their ideas & input.

previous slashdot.org stories that are related:
http://developers.slashdot.org/article....tid=156
http://books.slashdot.org/article....2&tid=6
http://developers.slashdot.org/article....2&tid=8

another related article from http://www.tomshardware.com/hardnews/200...61353.html
Quote:graphics processors supercharge everyday apps

by scott fulton

june 30, 2005 - 16:13 est

chapel hill (nc) - originally developed to remove a massive processing workload from the cpu, some scientists examine how the graphics processors can accelerate non-graphic applications as well. the geometric algorithms for modeling, motion, and animation (gamma) research group at the university of north carolina at chapel hill, reported this week that nvidia's 7800 gtx reference card increased the speed of test applications by up to 35x.

researchers with gamma said they found enormous performance capabilities in nvidia's newest graphics card that substantially outpaces its predecessor, the geforce 6800 ultra. compared to the 6800, the 7800 tripled performance; without the help of a graphics chip, the speed gains were between 8x and up to 35x. the discovery points to the removal of a bandwidth bottleneck, which may lead to the unimpeded development of co-processing libraries and software development kits (sdks) for everyday applications, such as spreadsheets and database management systems.

"it seems to me that the floating-point bandwidth on the new hardware is much more than on a 6800 ultra," reported naga k. govindaraju, research assistant professor at unc's department of computer science, in an interview with tom's hardware guide. "on a 6800 ultra, we are, in some manners, very limited...since the bandwidth is not good enough on the card, we were still not able to use the full performance of the card. on a 7800 gtx, it seems to me that the floating-point bandwidth is much higher."

the trick to exploiting the latent power of the graphics processor while it isn't producing scenery for 3d games, unc professor dinesh manocha told us, is to rephrase everyday operations as though they were specific two-dimensional graphics functions, like texture mapping. while everyday cpus work with threads, prof. minocha pointed out, graphics processors deal with streams capable of performing single instructions on multiple data elements simultaneously, through pipelines. by comparison, cpu-based parallelism divides instruction threads among multiple cores, for what prof. minocha calls a "von neumann bottleneck." the nvidia 7800 gtx utilizes 24 pixel pipelines and eight vertex units for its implementation of single instruction / multiple data (simd) architecture.

this technique of essentially pretending everything is a game, stated prof. govindaraju, reduces the critical elements of such everyday functions as sorting algorithms to a single instruction, which the graphics processor then applies to multiple pipelines at once. recent test results presented by the gamma team compared the performance of their gpusort algorithm to a traditional linear quicksort algorithm, compiled first under microsoft visual c++, then under intel's c++, which is optimized for hyperthreading. for sorting an array of 18 million elements, the visual c++ routine required about 21 seconds to accomplish what the gpusort routine produced in under 2. hyperthreading and the intel compiler boosted quicksort performance to about 17 seconds.

one of the purposes of the gamma team's work is to demonstrate the extent to which computing power in everyday pcs lies dormant, especially with regard to mere productivity applications as opposed to computation-rich 3d games. profs. manocha and govindaraju agree that general purpose computation libraries for such programs as excel and matlab could be the first step to the future development of sdks that make full-time use of graphics chips as math coprocessors.

but what prof. manocha also pointed out is that the performance increase in gpus is exceeding the rate of cpus. "if you look at [both] computation power and rasterization power," stated prof. manocha, "in the last six years, [performance for] pc graphics cards has grown at a [factor] of 2 or 2.25 per year, whereas cpus are barely doubling every 18 months." he added that he expects this trend to continue as both ati and nvidia produce their next generations of graphics cards in 2006.
a good dedicated site on the subject is; gpgpu (link), the general-purpose computation using graphics hardware site, (quote: "with the increasing programmability of commodity graphics processing units (gpus), these chips are capable of performing more than the specific graphics computations for which they were designed. they are now capable coprocessors, and their high speed makes them useful for a variety of applications. the goal of this page is to catalog the current and historical use of gpus for general-purpose computation.")



Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#39
microsoft directx video acceleration (directx va), could it be possible to use in xbmc for the xbox?
(this feature suggestion is an extention of the hardware accelerated mpeg-2/ts decoding request);

hope xbmc developer(s) can look into if this could be used for mplayer and/or maybe our new libmpeg2-core for the dvd-player library?

summery: microsoft directx video acceleration (dxva) allows software applications to accelerate video playback directly on graphics processors. if your graphics processor supports dxva and has built-in technology to accelerate dvd and mpeg-2 file playback, then dxva can provide (gpu) hardware acceleration. dxva is directshow based but maybe it could be accessed in xbmc somehow anyway?, (if dvxa could be suppoted them it could also be used to help dvd-menu support as it's via dvxa microsoft does dvd-video menu support in windows).

links:
- enabling directx video acceleration in a custom player
- how decoders use iamvideoaccelerator
- mapping directx video acceleration to iamvideoaccelerator
- dvxa api/ddi specification (rev 1.0) (directx 8.1 c++ archive)
- microsoft developer information on directx video acceleration
- directshow directx video acceleration video subtypes
- microsoft development network search result on dvxa
- directx video acceleration motion compensation callbacks
- calling the deinterlace ddi from a user-mode component
- per-pixel alpha blending (directx 8.1 c++ archive)

xbox related: i believe nvidia's hdvp (high defininition video processor) / vpe (video processing engine) support dvxa and as that hardware comes with exampel all nvidia's geforce2 gpu's and also all nvidia's nforce igp-chipset my guess is that is also comes in nvidia's gpu in the xbox. i collected much more on this and the xbox here (link, contains info + url's to nvidia development documents and sdk's).

exampel of above dxva technology in use is the newly announced "nvidia dvd decoder".
nvidia directx video acceleration (dxva) support mpeg-2 acceleration for:
inverse quantization (iq),
inverse discrete cosine transform (idct)
motion compensation (mo comp)
enables advanced de-interlacing
decodes high-definition mpeg-2
http://www.nvidia.com/object/dvd_decoder.html
http://www.nvidia.com/object/decoder_faq.html

dvxa is an application programming interface (api) and a corresponding motion compensation  device driver interface (ddi) for acceleration of digital video decoding. ddis are also provided as part of dxva; a deinterlacing ddi for deinterlacing and frame-rate conversion of video content, and a to support procamp ddi control and postprocessing of video content.



Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#40
iirc, the main problem is that the hw is undocumented and differ on the xbox gpu (something with enumeration of dma channels) compared to 'normal' nvidia based gpus.

so the dev that was up to the task didn't know how to feed the data to the gpu. somebody would have to break an non-disclosure agreement to make this possible, and i seriously doubt that will happen.. :/
Reply
#41
just wondering how the video works with xbmc. with mplayer, wouldnt that just be using the p3 in the xbox?

i was considering what the gpu might be able to do to accelerate it. or, some asm optimisation?

could this lead into better playback of avc media?
Reply
#42
mplayer is already one of the most optimised player for video. (not avc thou, coreavc's codec is the best). gpu accelerated decoder would be possible, thou everything has to be done using pixel1.2/vertex shaders, which is abit of a pain.. it's just a huge work to take on. we had one that attempted, but he has gone awol.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#43
but regarding the possibility of using the gpu? did anybody ever try to use the documentation gamester17 dug up?
i understand mpeg4 hd is out of xbox capabilities, maybe mpeg2 ts files in hi-res... are easier? less compression, easier decoding?

if it's a stupid question, i ask to be patient. Smile
For troubleshooting and bug reporting please make sure you read this first (usually it's enough to follow instructions in the second post).
Reply
#44
those api's are with 99.9% certainity not avail on the xbox.
Reply
#45
has anyone read this paper?

accelerate video decoding with generic gpu

they claim:

"we have achieved real-time playback of high definition video on a pc with an intel pentium iii 667-mhz cpu and an nvidia geforce3 gpu."

similar hardware specs, although the processor mentioned isn't a celeron. not too sure about the memory requirements either.

-george



Reply

Logout Mark Read Team Forum Stats Members Help
GPU assisted video decoding in XBMC, like motion compensation, idct, and deblocking?0