Get Embedded URL's Not in html Sourcecode
#1
I have been trying to build an Addon for what was the TVShack.
I have so far got menus etc working to the point where i can click a movie name and then scrape the page for a link.
However some/most of the links are embedded in the page and I dont know how to get to them.
ie. if you go here and select 3rd link down ' videobb'
http://tvsearch.co/movies/watch/D5M7T6NX...r/8LWNWGVP

If I search the sourcecode i cannot find the link i require.

If i use Firefox to get the Page Info and select Media, the Embedded Url is there.
ie. http://www.videobb.com/e/caEDLb9n945s

Is there a way in Python to get this info as well and then i can scrape the info i require.

thanks
Reply
#2
k_zeon Wrote:I have been trying to build an Addon for what was the TVShack.
I have so far got menus etc working to the point where i can click a movie name and then scrape the page for a link.
However some/most of the links are embedded in the page and I dont know how to get to them.
ie. if you go here and select 3rd link down ' videobb'
http://tvsearch.co/movies/watch/D5M7T6NX...r/8LWNWGVP

If I search the sourcecode i cannot find the link i require.

If i use Firefox to get the Page Info and select Media, the Embedded Url is there.
ie. http://www.videobb.com/e/caEDLb9n945s

Is there a way in Python to get this info as well and then i can scrape the info i require.

thanks

This link you posted opens a megavideo video for me
html code:
Code:
http://wwwstatic.megavideo.com/mv_player.swf?image=http://tvshack.bz/images/splash.jpg&v=4ZRZD56M

The url for the megavideo is megavideo.com/?v=4ZRZD56M

When you click on the videobb link it open the videobb page. I guess you need the cookie for this next page to load up, so you can access the videobb code.

Code:
<embed src="http://www.videobb.com/e/caEDLb9n945s" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="940" height="599">
Reply
#3
So this is what I would do.. guessing without trying anything in code Smile

From that page you can scrape the list of available sources, from this video you will have 3:

Megaupload
Megaupload
Videobb

Display those sources for user to select

User selects VideoBB so you would then.... EDIT - deleted the rest, read next post
Reply
#4
Ok, think I got it..

When you click on a source it runs a script

Code:
onclick="switchMirror('XG6RWJVN')"

Which sends the id to this page

Code:
http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=

Grab the video id of each source, and plug that to the end of the above link and you should be returned with the html portion that contains the video link you need

eg.
MegaVideo - http://tvsearch.co/i/application/ajax.ph...d=XG6RWJVN
MegaUpload - http://tvsearch.co/i/application/ajax.ph...d=XJVS5MXB
VideoBB - http://tvsearch.co/i/application/ajax.ph...d=8LWNWGVP
Reply
#5
Thanks Eldorado. this is what i managed to think of as well.
Got it working for Megavideo' and Videobb but for the life of me I cannot get the Putlocker id

Goto
http://tvsearch.co/movies/watch/Q7DWB9PZ/

get sourcecode and the do a findall with following

import urllib2,urllib,re

url='http://tvsearch.co/movies/watch/Q7DWB9PZ/'

req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')

response = urllib2.urlopen(req)
link=response.read()
response.close()
match=re.compile('<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">Putlocker</a>').findall(link)
print match

it ignores that i have a Putlocker in the Regex and always returns
Reply
#6
I gave it a try at - http://www.myregextester.com and it worked fine..

Also ran your script and returns as expected:

Code:
['9B8TZTWG']
Reply
#7
yes but that is the wrong code returned. its the 1st one for Megaupload

here is the bit i am scraping

<li class="searchList"><a href="#/mirror/9B8TZTWG" onclick="switchMirror('9B8TZTWG')">Megaupload</a><div id="rep-9B8TZTWG" onclick="reportLink('9B8TZTWG')" class="rateReport">Report</div><div id="up-9B8TZTWG" onclick="likeLink('9B8TZTWG')" class="rateUp">Rate Up</div></li><li class="searchList"><a href="#/mirror/L6S6DXL2" onclick="switchMirror('L6S6DXL2')">Putlocker</a><div id="rep-L6S6DXL2" onclick="reportLink('L6S6DXL2')" class="rateReport">Report</div><div id="up-L6S6DXL2" onclick="likeLink('L6S6DXL2')" class="rateUp">Rate Up</div></li>

as you can see below the ref L6S6DXL2 is what i want to be returned

<li class="searchList"><a href="#/mirror/L6S6DXL2" onclick="switchMirror('L6S6DXL2')">Putlocker</a>

any idea's
Reply
#8
Ah completely missed that

I'm guessing it's because of the 2nd .+?

Code:
<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">Putlocker</a>

If I take out Pulocker and replace with .+? it grabs both values, which is what you want to do anyways right? There should be no need to find specifically just the Putlocker link, you should be grabbing all of them in one pass

Code:
<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">.+?</a>

Do you have code you can post up on github?
Reply
#9
i dont have a github yet, not sure how to create one.
I do have software installed to get git hub etc

I was trying to get the Name as well as url because i need to differenciate between them.

i have this at present

elif mode == 'GetMovieSource':
html = net.http_GET(addon.queries['url']).content
try:
match=re.compile('<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">(.+?)</a>').findall(html)
for url,name in match:
if 'Megaupload' in name:
addon.add_directory({'mode' : 'GetMegaVideoLink', 'url' : 'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' }, 'Mega', '')
if 'Videobb' in name:
addon.add_directory({'mode' : 'GetVideobbLink', 'url' : 'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' }, 'Videobb' , '')


except:
pass
this works well and i managed to get the if statements to work
I then pass a Megaupload url to my next menu but it can be 2 types ie wwstatic.blah blah or http://www.megaupload.com etc

I am really finding it hard to use the ' if some in somthing: ' as i keep getting errors in the following

elif mode == 'GetMegaVideoLink':
html = net.http_GET(addon.queries['url']).content
try:
if 'wwwstatic.megavideo.com' in html:

match=re.compile('<embed src="(.+?)"').findall(html)
for url in match:
if 'wwwstatic.megavideo.com' in url: <<<<HERE
addon.add_video_item(url.replace('http://wwwstatic.megavideo.com/mv_player.swf?image=http://tvshack.bz/images/splash.jpg&v=','http://www.megavideo.com/?v=') ,{'title': url.replace('http://wwwstatic.megavideo.com/mv_player.swf?image=http://tvshack.bz/images/splash.jpg&v=','http://www.megavideo.com/?v=')})
if 'www.megaupload.com' in url:
addon.add_video_item(url ,{'title': url})

if 'www.megaupload.com' in html: <<<<HERE

except:
pass

if you take the HERE's out it runs. if left in then the srcipt errors out

p.s if you could help me set a Github up that would be great.
Reply
#10
I'm wondering why you are checking if 'Megaupload' and 'VideoBB' are in the name rather than just using the name for the directory?

eg

Code:
elif mode == 'GetMovieSource':
html = net.http_GET(addon.queries['url']).content
try:
match=re.compile('<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">(.+?)</a>').findall(html)
for url,name in match:
    addon.add_directory({'mode' : 'videolink', 'url' : 'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' }, name, '')

I think for rest you are making it harder than it needs to be, with some regex work you could probably get the video id you need much easier.. stay away from hardcoding so much.. but would be much easier if can see your complete code

For Github it is actually rather simple, sign up for an account and they have a very easy to follow how-to to get you started, much better than i would be able to explain it Smile

For time being you can post your code here, but wrap it in [CODE] tags like I have done to make it easier to read.. (see the '#' above the text editor, click that)
Reply
#11
I have hardcoded things to be able to check urls in different ways.
I have only just started to learn python and am not very good.

I appreciate your advice.

thanks

Code:
import xbmc, xbmcgui, urllib
import os
import re
import string
import sys
from t0mm0.common.addon import Addon
from t0mm0.common.net import Net
import urlresolver

addon = Addon('plugin.video.kzeon', sys.argv)
net = Net()

logo = os.path.join(addon.get_path(), 'art','logo.jpg')

base_url = 'http://www.tvsource.co'

mode = addon.queries['mode']
play = addon.queries.get('play', None)

if play:
    stream_url = urlresolver.resolve(play)
    addon.resolve_url(stream_url)

elif mode == 'resolver_settings':
    urlresolver.display_settings()



elif mode == 'GetVideobbLink':
    html = net.http_GET(addon.queries['url']).content
    try:
        match=re.compile('<embed src="(.+?)"').findall(html)
        for url in match:

             addon.add_video_item(url,{'title': url})
            
    except:
        pass  

elif mode == 'GetPutLockerLink':
    html = net.http_GET(addon.queries['url']).content
    try:
        match=re.compile('<a href="(.+?)" target').findall(html)
        for url in match:

             addon.add_video_item(url,{'title': url})
            
    except:
        pass

elif mode == 'GetMegaVideoLink':
    html = net.http_GET(addon.queries['url']).content
    if 'wwwstatic.megavideo.com' in html:
    try:
        
            
        match=re.compile('<embed src="(.+?)"').findall(html)
        for url in match:
           if 'wwwstatic.megavideo.com' in url:
            addon.add_video_item(url.replace('http://wwwstatic.megavideo.com/mv_player.swf?image=http://tvshack.bz/images/splash.jpg&v=','http://www.megavideo.com/?v=') ,{'title': url.replace('http://wwwstatic.megavideo.com/mv_player.swf?image=http://tvshack.bz/images/splash.jpg&v=','http://www.megavideo.com/?v=')})
           if 'www.megaupload.com' in url:
             addon.add_video_item(url ,{'title': url})
            
        
            
    except:
        pass  

    if 'www.megaupload.com' in html:


elif mode == 'GetMovieSource':
    html = net.http_GET(addon.queries['url']).content
    try:
        match=re.compile('<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">(.+?)</a>').findall(html)
        for url,name in match:
           if 'Megaupload' in name:
            addon.add_directory({'mode' : 'GetMegaVideoLink', 'url' : 'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' }, 'Mega', '')
           if 'Videobb' in name:
            addon.add_directory({'mode' : 'GetVideobbLink', 'url' :  'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' },  'Videobb' , '')

            
    except:
        pass


    
elif mode == 'GetMovieTitle':
    html = net.http_GET(addon.queries['url']).content
    try:
        match=re.compile('<li class="searchList"><a href="(.+?)">(.+?)</a>').findall(html)
        for url,name in match:
          
            addon.add_directory({'mode' : 'GetMovieSource', 'url' : url , 'img' : '' }, name, '')
    except:
        pass


elif mode == 'AtoZ':
    html = net.http_GET(addon.queries['url']).content
    try:
        match=re.compile('<a href="(.+?)" class="atz">(.+?)</a>').findall(html)
        for url,name in match:
            
            addon.add_directory({'mode' : 'GetMovieTitle', 'url' : url , 'img' : '' }, name, '')
    except:
        pass


elif mode == 'main':
    addon.show_small_popup('k_zeon addon', 'Is now loaded enjoy', 3000,
                           logo)
    addon.add_directory({'mode' : 'AtoZ', 'url' : 'http://tvsearch.co/movies/'}, 'Movies')
    addon.add_directory({'mode': 'resolver_settings'}, 'resolver settings',
                        is_folder=False)

if not play:
    addon.end_of_directory()
Reply
#12
got github working.

https://github.com/kzeonG/TVShack
Reply
#13
I can see why you are doing it this way, I only mentioned it because you introduce to many extra clicks for the user

eg. User clicks on movie, then clicks on source, then has to click again to play movie

Why not make either the movie itself a video item or at the very least the sources? Then handle the scraping of the final video url within your 'if play:' block

Take a peak at my Project Free TV addon for an idea - https://github.com/Eldorados/eldorado-xbmc-addons

The Red Letter Media works in the same way as well

In my case I needed to know some extra info about what kind of video it was, so I created my own 'add_video_item' method that passed some extra parms, t0mmo is working on adding this into his common library

If I do a regex like this I can find the video id each time:
Code:
r = re.search('="http://[a-z/.&?_=:]+[/=](.+?)"',html)
print r.group(1)

You just then need to know what to do with it and where the id is from so that you can format a url to send into urlresolver - t0mmo is also working on a way of passing in just a host name and video id, no need to form a full url.. this way in your script you won't need to know or care what the source is
Reply
#14
I had a look at you ProjectFreeTV and i sort of follow the code.

As i am still learning I dont understand how all the bits work yet and I seem to have problems when introducing if 'Megaupload' = name: or something simliar

I start to get script errors. Is it because of CRLF where it should be LF because for the life of me as soon as i add if 'Megaupload' = name: is fails.

Re your suggestion about too many clicks , yes i know , i will try to make it so i click a movie and then show links then play the movie, but at moment i am just learning the syntax.

I use Python Idle on windows to edit code. should i be using something else...

Also where i have

elif mode == 'GetMovieSource':
html = net.http_GET(addon.queries['url']).content
try:
match=re.compile('<li class="searchList"><a href="#/mirror/(.+?)" onclick="switchMirror.+?">(.+?)</a>').findall(html)
for url,name in match:
if 'Megaupload' in name:
addon.add_directory({'mode' : 'GetMegaVideoLink', 'url' : 'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' }, 'Mega', '')
if 'Videobb' in name:
addon.add_directory({'mode' : 'GetVideobbLink', 'url' : 'http://tvsearch.co/i/application/ajax.php?op=switchMirror&vid=' + url , 'img' : '' }, 'Videobb' , '')
except:
pass

if i take out the Try,Except & Pass and then run it fails...Why?
Reply
#15
Can you post error logs?

You are using double == when writing the if statements?

I use IDLE as well to try out little chunks of code, great way to test small pieces instead of trying to debug thru xbmc
Reply

Logout Mark Read Team Forum Stats Members Help
Get Embedded URL's Not in html Sourcecode0