XBMC Community Forum
script.module.urlresolver development - Printable Version

+- XBMC Community Forum (http://forum.xbmc.org)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Python Add-on Development (/forumdisplay.php?fid=26)
+--- Thread: script.module.urlresolver development (/showthread.php?tid=105707)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31


- t0mm0 - 2011-07-31 21:51

hi,

rogerthis Wrote:Would it be possible to add a flag for working as each module could stop working at any time and need to get updated?

So in my addon I could have something like this
Code:
if urlresolver.novamov.working == true and urlresolver.novamov.exists == true
    novamov=re.compile('href="http://www.novamov.com/video/(.+?)"').findall(html)

This would enable my addon to only scrap working links.

i don't think there is any need for that. the module would have to be updated to say it wasn't working, and in that case it may as well just get fixed, or if it was too hard to fix quickly then valid_url() could be changed to just return False. that way the addon doesn't have to care (or add the extra checking code).

as it stands, addons don't directly access the resolver plugins, they just send a url to resolve() and it automatically picks the correct resolver to use (soon using user changeable priorities which will define the order that resolver plugins are tried). i think this makes it more flexible and means there is less code required in the addons themselves (i'm trying to make both the addons and the resolver plugins as simple as possible so they are easier to write, and any complicated stuff should be in urlresolver)

there is already filter_urls() that takes a list of urls and only returns those that can be resolved, and i'm thinking of adding another method that simply takes raw html and finds all the resolvable urls (so you wouldn't have to do any scraping at all). of course there is no way to tell if the files actually exist until you visit each hoster site....

thoughts anyone? (assuming any of that made sense!)

t0mm0


- rogerthis - 2011-07-31 23:51

So, a page like this http://www1.zmovie.tv/movies/view/captain-america-the-first-avenger-2011 that has multiple links, it will find all the links. It wouldn't really want to resolve the final playable url until you select which one you wanted. That would be too much work (too slow), wouldn't it?

Also there are sites like this http://www.fastpasstv.ms/tv/flashpoint/season-4/episode-4/ that have redirects to itself before you get to the video site url. Would it be able to handle this?


- t0mm0 - 2011-08-01 00:53

rogerthis Wrote:So, a page like this http://www1.zmovie.tv/movies/view/captain-america-the-first-avenger-2011 that has multiple links, it will find all the links. It wouldn't really want to resolve the final playable url until you select which one you wanted. That would be too much work (too slow), wouldn't it?

filter_urls() just takes a list of urls scraped from the page and checks them against the valid_url() method of the resolver plugins. all this does is checks to see if any of the resolver plugins say they can resolve those urls. it doesn't actually do any resolving until you call resolve() so this is fast. obviously it doesn't know whether the files actually exist or not, just whether there is a resolver plugin that says it can resolve urls from that particular host site.

rogerthis Wrote:Also there are sites like this http://www.fastpasstv.ms/tv/flashpoint/season-4/episode-4/ that have redirects to itself before you get to the video site url. Would it be able to handle this?

those links look suspiciously base64-ish...

anyway, you could either write a simple resolver plugin that accepted fastpasstv redirect links (like the tubeplus.me one in git), or work out the real url in the addon, or possibly you could do a HEAD request and see if there is a redirect in place (which would be faster than grabbing the whole page)

well, in case anyone thinks i've been slacking Wink i've been busy splitting out a load of generic stuff from the test addon and the urlresolver module into another module which provides a bunch of handy functions for addons (like adding items to the directory list) and some lower level stuff like handling html entities, and another module that wraps up urllib2 to make it easy to make http requests with gzip support and character encoding decoded properly (maybe caching and proxy stuff too). should end up in git sometime tomorrow. then i can release the test addon and it might be more obvious how the urlresolver works.

t0mm0


- t0mm0 - 2011-08-01 19:10

hi all,

head on over to the github page and you'll noticed i just added a load of stuff.

script.module.t0mm0.common - has a bunch of useful code that is common to lots of addons. currently there is a class (Addon) that wraps around various xbmc addon functionality, and another one (Net) that wraps urllib2 to make it easy to deal with cookies, proxies etc and tries it's best to return proper unicode no matter what character set is used on the web page)

thoughts (or patches!) on more useful stuff to add are welcomed (already got a few ideas...)

plugin.video.t0mm0.test - demos the functionality of both urlresolver and t0mm0.common. includes a test section which has a test link for each resolver plugin currently available, and an example of scraping tubeplus.me (i don't like this site as it is slow, has lots of incorrect links, and lists every show ever made even if there are no links available, but it is ok to test stuff with - anyone know of a better candidate for testing?)

hopefully this will make it a little clearer how everything works, but please ask if it's not. there are still no docs as i'm not done changing stuff around yet. there is not much error handling in some places either, so expect some 'script errors' (eg when there is no resolvable link or if the file has been removed from the hoster site)

please give it a test and look at the code (especially the test addon, which shows you how to use stuff, and the plugins directory of urlresolver which shows how to make a plugin) and let me know what you think. my goal is to make writing an addon as simple as possible requiring as little code as possible.

thanks,

t0mm0.


- rogerthis - 2011-08-01 19:40

Big THANK YOU for this.

Had a quick look and it looking really good.

You don't have this in your repository.
Is the plan to get the two scripts script.module.t0mm0.common and script.module.urlresolver into the official xbmc repository?


- t0mm0 - 2011-08-01 20:19

rogerthis Wrote:Big THANK YOU for this.

Had a quick look and it looking really good.

thanks!

rogerthis Wrote:You don't have this in your repository.
Is the plan to get the two scripts script.module.t0mm0.common and script.module.urlresolver into the official xbmc repository?

yes it is not in a repo at the moment because it is under heavy development. once it stabilises a bit i'll bung it in my repo, and hopefully once it's tested by lots of people it can go into the official one. i'm wary of distributing it too widely while it is still in such a state Wink

t0mm0


- rogerthis - 2011-08-02 21:43

I have been trying to make some url reslovers but I'm failing miserable.
What guides did you use to get started? Is there any other apps other that wireshark that you use?


- t0mm0 - 2011-08-02 22:00

rogerthis Wrote:I have been trying to make some url reslovers but I'm failing miserable.
What guides did you use to get started? Is there any other apps other that wireshark that you use?

i hardly ever need to use wireshark. you can work out what's going on on most sites with the developer tools in chrome (or firebug in firefox) and that is much more user friendly too. you only need to resort to wireshark for low level protocol sniffing which is hardly ever required!

if you want to give us a clue about the site you are trying to work on we might be able to point you in the direction of what to look for, or you could always try looking at a site for which code already exists and try and work out how the dev got to the answer.

t0mm0


- rogerthis - 2011-08-02 22:42

Thanks getting back so quick.

The site is vidreel
http://vidreel.com/video/OTM3NDM0/ which redirects to http://vidreel.com/human/OTM3NDM0/ straight away. I know that I have to include
Code:
"name" : "watch" and "action" : "#"
but how do I know what else needs to be included eg cookies, Referer. Is it trial and error?

Here is the code for the video.
Code:
<script type='text/javascript'>
var so = new SWFObject('../../9.swf','ply','600','340','9','#ffffff');
so.addParam('allowfullscreen','true');
so.addParam('allowscriptaccess','always');
so.addParam('wmode','transparent');
so.addVariable('file','08022eb20345a3d9fd77b34f087a97f8.mp4');
so.addVariable("skin", "../../dangdang.swf");
so.addVariable('bufferlength','5');
so.write('mediaspace');
</script>

Is the link?
Code:
http://vidreel.com/9.swf:08022eb20345a3d9fd77b34f087a97f8.mp4

Do I need to include the other parameters for the file to play?


- t0mm0 - 2011-08-03 00:11

so here is what i just did when taking a quick look at this site.....

rogerthis Wrote:Thanks getting back so quick.

The site is vidreel
http://vidreel.com/video/OTM3NDM0/ which redirects to http://vidreel.com/human/OTM3NDM0/ straight away. I know that I have to include
Code:
"name" : "watch" and "action" : "#"
but how do I know what else needs to be included eg cookies, Referer. Is it trial and error?

in a incognito window (so you don't get any old cookies) of chrome, open dev tools (ctrl+shit+i) then load the page. as you say it is diverted to the /human page. right click on the 'continue to video' button and choose 'inspect element' and you will be taken to the html code for the button. you'll see that it is in a html form, but that the button itself is just an <a> tag which just links back to the original page. weird. unless there is some fancy javascript going on the form isn't being submitted.

click on the button and look at the top entry ion the network tab of the dev tools. you'll see that sure enough, it is just a GET request. you'll notice that there are some cookies sent (you can ignore all the tracking ones - '__utma', '__utmb' etc. - you'll notice these appear on lots of pages) - 'videohuman' an sometimes 'video'. maybe these are all that is required?

so i test in python. in the 'lib' directory of script.module.t0mm0.common i type python to enter the interactive interpretor and use the Net class with the debug flag enabled to load the page (this is easier than straight urllib2 as it handles cookies etc. with hardly any code required). it gets diverted to the /human page but you can see the cookies being set. so i load the page again, this time it desn't get diverted! so all you have to do is load the url twice, the first time the cookies get set and the second time it loads the real page. you can see my python session here

stage one done!

rogerthis Wrote:Here is the code for the video.
Code:
<script type='text/javascript'>
var so = new SWFObject('../../9.swf','ply','600','340','9','#ffffff');
so.addParam('allowfullscreen','true');
so.addParam('allowscriptaccess','always');
so.addParam('wmode','transparent');
so.addVariable('file','08022eb20345a3d9fd77b34f087a97f8.mp4');
so.addVariable("skin", "../../dangdang.swf");
so.addVariable('bufferlength','5');
so.write('mediaspace');
</script>

Is the link?
Code:
http://vidreel.com/9.swf:08022eb20345a3d9fd77b34f087a97f8.mp4

Do I need to include the other parameters for the file to play?

getting close.

as you noticed the javascript you posted above shows the file name but not the server or protocol - this means it is probably embeded in the swf file so we need to do more work.

pressing the play button on the flash player gives no entries in the network tab of the chrome dev tools so it is probably not just downloading the video via http. that makes it a little tougher.

i decided to use rtmpsrv which comes with rtmpdump and lets you intercept calls to rtmp servers and gives you an rtmpdump command line to use.

(this is on linux, dunno about other os's) first off i run the iptables command that diverts traffic for port 1935 (the rtmp port) to localhost. then run rtmpsrv and press play on the flash player. now don't forget to remove the iptables rule (or prepare to get very confused later!). my terminal output from this process is here.

copy and paste the rtmpdump command line to test - it works!

this gives you all the needed information to play the link in xbmc. the rtmp command line translates to
Code:
rtmp://213.163.74.245/vod/mp4:08022eb20345a3d9fd77b34f087a97f8.mp4 swfUrl=http://vidreel.com/7.swf pageUrl=http://vidreel.com/video/OTM3NDM0/
in xbmc talk. bung that in a .strm file to test and.... yay! that works too Wink

now all you need to do is try a bunch of different videos to see if they all use the same rtmp server, and if not work out how to tell the difference (hint: maybe the different numbered swf files refer to different servers? can you just pick one at random or do you need to use a specific server for a specific video? so many questions - this is where the fun of testing comes in Wink)

hope that helps - shout if you don't follow any of that. obviously it isn't written as a nice tutorial but is literally just notes i took as i was looking at the site. maybe we can turn it into a proper tutorial as part of the docs for this module.....

t0mm0.