[Release] Parsedom and other functions

  Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
newatv2user Offline
Fan
Posts: 300
Joined: May 2011
Reputation: 27
Post: #81
Thanks.
find quote
_Pierre_ Offline
Junior Member
Posts: 33
Joined: Nov 2009
Reputation: 0
Post: #82
I saw that you updated parseDOM till version 0.9.2
So I tought gonna try it out.
Still have problems with the fetchPage with posting data

Code:
T:3232  NOTICE: [SoundCloud] fetchPage : 'called for : 'https://soundcloud.com/connect/login''
20:44:39 T:3232  NOTICE: [SoundCloud] fetchPage : 'Posting data: username=*******&redirect_uri=plugin%3A%2F%2Fplugin.audio.soundcloud%2Foauth_callback&response_type=token&client_id=hijuflqxoOqzLdtr6W4NA&scope=non-expiring&password=******&display=popup'
20:44:39 T:3232  NOTICE: [SoundCloud] fetchPage : 'Added refering url: http://soundcloud.com'
20:44:39 T:3232  NOTICE: [SoundCloud] fetchPage : 'connecting to server...'
20:44:39 T:3232  NOTICE: [SoundCloud] fetchPage : 'URLError : <urlopen error unknown url type: plugin>'

Getting crazy about it grr
find quote
takoi Offline
Fan
Posts: 511
Joined: Oct 2009
Reputation: 6
Location: Norway
Post: #83
Would be nice if you could remove the xbmc and xbmcgui dependencies from functions that don't use them. Annoyed by having to write xbmc = None and xbmcgui = None in the interpreter, every time I try to use this outside xbmc.
find quote
takoi Offline
Fan
Posts: 511
Joined: Oct 2009
Reputation: 6
Location: Norway
Post: #84
Found a bug in the ret value parsing. Think it has to do with tab characters:

Code:
html = '<div id="player" class="loading tv " \r \tdata-media="http://nordond25a-f.akamaihd.net/z/no/open/db/db70c9ca4be6c56b4813f550d822b27e77116bd9/db70c9ca4be6c56b4813f550d822b27e77116bd9_,141,316,563,1266,2250,.mp4.csmil/manifest.f4m" \r \tdata-timezoneoffset="2" \r \tdata-startingbitrateindex="3"\r \tdata-streamingerrormessageurl="/streamingerror"\r \tdata-outoflivebuffermessageurl="/outoflivebuffer"\r \t\t\t\t data-subtitlesurl = "/programsubtitles/koid21008710"\r \t\t\t data-IsRatedR = "False"\r >\r\n\t<!--googleoff: all-->\r\n\t\t\t<div id="nrkFlashContainer">\r\n\t\t\t\t<div class="msg-board">\r\n\t\t\t\t\t\r\n\t<img width="960" \r \t\t class=""\r \t\t alt="" \r \t\t src="http://gfx.nrk.no/iiUIuSEgJNUZ5ESHnHRXHgpqjzVQx3q0AqWf4v5n3sEQ" />\r\n\t<div class="msg no-js-msg">\r\n\t\t<h2><strong class="heading">Ooops, Javascript mangler!</strong></h2>\r\n\t\t<p>\r\n\t\t\tVi kan ikke se at du har aktivert Javascript p\xc3\xa5 din PC, dette m\xc3\xa5 v\xc3\xa6re aktivert for at v\xc3\xa5r videoavspiller skal fungere.<br />\r\n\t\t\t<a href="http://www.nrk.no/some/support/page" target="_blank">Les mer<span class="offscreen"> om hvorfor vi krever javascript</span></a> p\xc3\xa5 v\xc3\xa5re hjelpesider.\r\n\t\t</p>\r\n\t</div>\r\n\r\n\t\t\t\t\t<div class="msg no-flash-msg">\r\n\t\t\t\t\t\t    <h2><strong class="heading">Ooops, vi har problemer med \xc3\xa5 laste Flash for avspilling!</strong></h2>\r\n\t\t\t\t\t\t    <p>\r\n\t\t\t\t\t\t\t\t<a href="http://get.adobe.com/flashplayer" target="_blank">Klikk her for \xc3\xa5 installere Flash p\xc3\xa5 din maskin.</a><br /><br />\r\n\t\t\t\t\t\t\t\tVirker det fortsatt ikke?<br/>\r\n\t\t\t\t\t\t\t    <a href="/hjelp/1.7916314">Les mer<span class="offscreen"> om flash og hvorfor vi krever det</span></a> p\xc3\xa5 v\xc3\xa5re hjelpesider.\r\n\t\t\t\t\t\t    </p>\r\n\t\t\t\t\t</div>\r\n\t\t\t\t</div>\r\n\t\t\t</div>\r\n\t<!--googleon: all-->\r\n</div>\r\n\r\n\r\n\r\n\r\n\t<section id="programMetaData" class="container tight">\r\n\t\t<aside id="episode2" class="span-5 clearfix">\t\t\r\n\r\n\t<img width="300" \r \t\t class="episode-image"\r \t\t alt="Verda vi skaper" \r \t\t src="http://gfx.nrk.no/iiUIuSEgJNUZ5ESHnHRXHgeDOYjDhHYN0qWf4v5n3sEQ" />\r\n\t\t\t<!--googleoff: snippet-->\r\n\t\t\t<ul class="infolist clearfix">\r\n\t\t\t\r\n\t\t\t\t\t<li><mark class="age-restriction"><span>A</span></mark> Tillatt for alle</li>\r\n\r\n\t\t\t\t<li><strong>Tilgjengelig til:</strong> \r\n<time datetime="2012-06-16T16:25:00+02:00">16.06.2012</time>\r\n\t\t\t\t</li>\r\n\t\t\t</ul>\r\n\t\t\r\n\t\t\t\r\n\t\t\t<ul class="sharethis clearfix">\r\n\t\t\t\t<li><a href="http://twitter.com/home?status=\r \t\t\t\t\t\t\t\tSe+%27Verda+vi+skaper%27+p%c3%a5+NRK+TV+http%3a%2f%2ftv.nrk.no%2​fserie%2fverda-vi-skaper%2fkoid21008710%2fsesong-1%2fepisode-6"\r \t\t\t\t\t   target="_blank" title="Del/tips på Twitter"><img src="http://psfil.nrk.no/content/images/tweet.png?1.1.4533.14084a" alt="Del/tips på Twitter" /></a></li>\r\n\t\t\t\t<li><a href="http://www.facebook.com/sharer.php?u=http://tv.nrk.no/serie/verda-vi-skaper/koid21008710/sesong-1/episode-6"\r \t\t\t\t\t   target="_blank" title="Del/tips på Facebook"><img src="http://psfil.nrk.no/content/images/facebook.png?1.1.4533.14084a" alt="Del/tips på Facebook" /></a></li>\r\n\t\t\t</ul>\r\n\t\t\t<!--googleon: snippet-->\r\n\t\t</aside>\r\n\r\n\t\t<article id="episode" class="span-10 last">\r\n\t\t\t<hgroup>\r\n\t\t\r\n\t\t\t\t\t<h2><a href="http://tv.nrk.no/serie/verda-vi-skaper">Verda vi skaper</a>  \r\n\t\t\t\t\t</h2>\r\n\t\t\t\t<h1>\r\n\t\t\t\t\tVerda vi skaper \r\n\t\t\t\t\t\t<span class="small">6:8</span> \t\t\r\n\t\t\t\t</h1>\t\t\r\n\t\t\t</hgroup>\r\n\t \r\n\t\t\t<section id="taglist" class="stack-links">\r\n\t\t\t\t<strong>Emner:</strong>\r\n\t\t\t\r\n<a href="/kategori/dokumentar-og-fakta" title="Vis flere programmer i kategori &quot;Dokumentar og fakta&quot;">Dokumentar og fakta</a>, <a class="thin" href="/sok?m=tv&amp;q=Kenya&amp;filter=rettigheter&amp;side=1" title="Vis flere programmer tagget med &quot;Kenya&quot;">Kenya</a>, <a class="thin" href="/sok?m=tv&amp;q=Slettelandet&amp;filter=rettigheter&amp;side=1" title="Vis flere programmer tagget med &quot;Slettelandet&quot;">Slettelandet</a>, <a class="thin" href="/sok?m=tv&amp;q=rovdyr&amp;filter=rettigheter&amp;side=1" title="Vis flere programmer tagget med &quot;rovdyr&quot;">rovdyr</a>, <a class="thin" href="/sok?m=tv&amp;q=urbefolkning&amp;filter=rettigheter&amp;side=1" title="Vis flere programmer tagget med &quot;urbefolkning&quot;">urbefolkning</a>, <a class="thin" href="/sok?m=tv&amp;q=tilpasning&amp;filter=rettigheter&amp;side=1" title="Vis flere programmer tagget med &quot;tilpasning&quot;">tilpasning</a>, <a class="thin" href="/sok?m=tv&amp;q=kultur&amp;filter=rettigheter&amp;side=1" title="Vis flere programmer tagget med &quot;kultur&quot;">kultur</a>\r\n\t\t\t</section>\r\n\t\t\r\n\t\t\r\n\t\t\t<div class="tab">\r\n\t\t\t\t<ul class="tab-nav line-sep clearfix">\r\n\t\t\t\t\t<li class="active"><h2><a href="#information">Programinformasjon</a></h2></li>\r\n\t\t\r\n\t\t\t\t\t\t<li><a href="/programreview/koid21008710" id="reviewLink" rel="nofollow">Omtale</a>\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t</li>\r\n\t\t\t\t\t\t<li><a href="/programsubtitles/koid21008710/html" id="subtitlesLink" rel="nofollow">Teksting</a></li>\r\n\t\t\t\t</ul>\r\n\t\t\t\t<div class="tab-panels">\r\n\t\t\t\t\t<section id="information" class="tab-panel">\r\n\t\t\t\t\t\t<div class="mod toggle closed">\r\n\t\t\t\t\t\t\t<p>\r\n\t\t\t\t\t\t\t\tBr. naturserie. På slettelandet veks gras som gir mat til dyr og menneske. Men nokre gonger er kampen for føda farleg. Dorobo-folket i Kenya må jage vekk svoltne løver for å skaffe levebrød. Mennesket og dyra lever tett saman på stepper over heile kloden. Norsk kommentar: Ola Bøe. (Human Planet: Grasslands) (6:8)\r\n\t\t\t\t\t\t\t\t<a href="#" class="control hide-when-open" title="Vis mer om Verda vi skaper">Vis mer</a>\r\n\t\t\t\t\t\t\t</p>\r\n\t\t\t\t\t\t\t<div class="details">\r\n\t\t\t\t\t\t\t\t<dl class="infolist">\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t<dt>Tilgjengelig i:</dt> <dd>Norge</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Første gang sendt:</dt> <dd>    <strong></strong> 08.06.2012 20:05</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Siste gang sendt:</dt> <dd>    <strong></strong> 08.06.2012 20:05</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Planlagt sendt:</dt> <dd>    <strong></strong> 09.06.2012 16:25</dd>\r\n\t\t\t\t\t\t\t\t</dl>\r\n\t\t\t\t\t\t\t\t<dl class="infolist">\r\n\t\t\t\t\t\t\t\t\t\t<dt>Serietittel:</dt> <dd>Verda vi skaper</dd>\r\n\r\n\t\t\t\t\t\t\t\t\t\t<dt>Episodetittel:</dt><dd>Verda vi skaper 6:8</dd>\r\n\r\n\t\t\t\t\t\t\t\t\t\t<dt>Orginal episodetittel:</dt> <dd>Human Planet</dd>\r\n\t\t\t\t\t\t\t\t\t\t<dt>Varighet:</dt> <dd>48 minutter</dd>\r\n\t\t\t\t\t\t\t\t</dl>\r\n\t\t\t\t\t\t\t\t\r\n\r\n\r\n\t\t\t\t\t\t\t\t\t<h3>Seriebeskrivelse:</h3><p>Britisk dokumentarserie</p>\r\n\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t<a href="#" class="control" title="Vis mindre om Verda vi skaper">Vis mindre</a>\r\n\t\t\t\t\t\t\t</div>\r\n\t\t\t\t\t\t</div>\r\n\t\t\t\t\t</section>\r\n\t\t\t\t</div>\r\n\t\t\t</div>\r\n\t\t</article>\r\n\r\n\t</section>'

This returns an empty list:
parseDOM(html, 'div', {'id':'player'}, ret='data-media')

however, parseDOM(html, 'div', {'id':'player'}, ret='\tdata-media') works, but \t should not be considered to be part of the tag.
find quote
takoi Offline
Fan
Posts: 511
Joined: Oct 2009
Reputation: 6
Location: Norway
Post: #85
more examples of 'ret' not working correctly:

Code:
>>> html ='<div id="player" class="loading tv " \r \tdata-media="http://nordond2b-f.akamaihd.net/z/no/open/1e/1ee465d30cdea83ac036714a0d4e7c7ff7a1095d/1ee465d30cdea83ac036714a0d4e7c7ff7a1095d_,141,316,563,1266,2250,.mp4.csmil/manifest.f4m" \r \tdata-timezoneoffset="2" \r \tdata-startingbitrateindex="3"\r \tdata-streamingerrormessageurl="/streamingerror"\r \tdata-outoflivebuffermessageurl="/outoflivebuffer"\r \t\t\t\t data-subtitlesurl = "/programsubtitles/mkds61000910"\r \t\t\t data-IsRatedR = "False"\r >dsgdsfsdf</div>'

>>> parseDOM(html, 'div', {'id':'player'}, ret='\tdata-outoflivebuffermessageurl')
['/outoflivebuffer"\r \t\t\t\t data-subtitlesurl = "/programsubtitles/mkds61000910"\r \t\t\t data-IsRatedR = "False']

>>> parseDOM(html, 'div', {'id':'player'}, ret='data-subtitlesurl')
[]
find quote
newatv2user Offline
Fan
Posts: 300
Joined: May 2011
Reputation: 27
Post: #86
Is the debug feature not available with parsedom anymore? I am not able to get it to give me any debug message.
(This post was last modified: 2012-07-04 07:05 by newatv2user.)
find quote
stacked Offline
Skilled Python Coder
Posts: 792
Joined: Jun 2007
Reputation: 17
Post: #87
I've been using buggalo to track errors and I've notice two common issues with the fetchPage function.

1. Here it looks like the connection times out at line 399 of CommonFunctions.py and there is no exception to handle the socket timeout.

Log:
Code:
Type    <class 'socket.timeout'>
Message    timed out
Stacktrace     File "/home/xbmc/.xbmc/addons/plugin.video.revision3/default.py", line 335, in <module>
    build_main_directory(url)
File "/home/xbmc/.xbmc/addons/plugin.video.revision3/default.py", line 51, in build_main_directory
    html = common.fetchPage({"link": url})['content']
File "/home/xbmc/.xbmc/addons/script.module.parsedom/lib/CommonFunctions.py", line 399, in fetchPage
    ret_obj["content"] = con.read()
File "/usr/lib/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 541, in read
    return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 592, in _read_chunked
    value.append(self._safe_read(amt))
File "/usr/lib/python2.7/httplib.py", line 647, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)


2. For some odd reason, fetchPage returns without a 'content' key. I believe this happens when the statement at line 398 of CommonFunctions.py is not true.

Log:
Code:
Type    <type 'exceptions.KeyError'>
Message    'content'
Stacktrace     File "/storage/sdcard0/Android/data/org.xbmc.xbmc/files/.xbmc/addons/plugin.video.revision3/default.py", line 339, in <module>
    get_video(url, name, plot, studio, episode, thumb, date)
File "/storage/sdcard0/Android/data/org.xbmc.xbmc/files/.xbmc/addons/plugin.video.revision3/default.py", line 231, in get_video
    result = common.fetchPage({"link": url})['content']


Sorry, I don't have the full debug logs. Thanks again for this script.
find quote
takoi Offline
Fan
Posts: 511
Joined: Oct 2009
Reputation: 6
Location: Norway
Post: #88
What did you expect to happen? When it times out it times out. If you have a way to recover, then catch it..
(This post was last modified: 2012-08-23 11:08 by takoi.)
find quote
stacked Offline
Skilled Python Coder
Posts: 792
Joined: Jun 2007
Reputation: 17
Post: #89
(2012-08-23 11:08)takoi Wrote:  What did you expect to happen? When it times out it times out. If you have a way to recover, then catch it..

I did create a way to recover. I was just trying to point out the problem so it can be corrected within the fetchPage function.

Anyways, here is what I did. I created a function that uses fetchPage. If there is an error in fetchPage, the function will have attempt to load the page. If it still fails after 3 retries, buggalo will catch the error.
(This post was last modified: 2012-08-23 22:44 by stacked.)
find quote
mrstealth Offline
Junior Member
Posts: 4
Joined: Sep 2012
Reputation: 0
Post: #90
Thank you for great script, this script helps to speed up the add-on development and I really like it and use it in all my xbmc add-ons.

But since today all my plugins are broken and I get the following error:

Code:
response = common.fetchPage({"link": url})
File ".../Library/Application Support/XBMC/addons/script.module.parsedom/lib/CommonFunctions.py", line 410, in fetchPage
ret_obj["content"] = inputdata.decode("utf-8")
File "/Applications/XBMC.app/Contents/Frameworks/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 275-276: invalid data

I investigated the issue and found out, that the default encoding is set to 'utf-8' in version 1.2.0, but it causes crash for non utf-8 pages ( Fetchpage should decode binary to utf-8 ).

Is it possible to provide some configuration option like: common.encoding = 'cp1251'?

Many thanks in advance for your help and I hope this will be fix in the next add-on version.
(This post was last modified: 2012-09-19 18:24 by mrstealth.)
find quote
Post Reply