Filmweb scraper

  Thread Rating:
  • 2 Votes - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
C-Quel Offline
Retired Team-XBMC Member
Posts: 1,378
Joined: Aug 2004
Reputation: 0
Post: #21
Well looks like you dont repeat the thumb expression anyway.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]

If scraper related please always grab the latest XML relevant to the content you are trying to grab info for from this link https://xbmc.svn.sourceforge.net/svnroot...m/scrapers

System Specs:

A Computer with loads of shiny things that make a noise and bring life to my tv, and xbmc ofc :)

iNerd Store

iNerd Forum
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #22
scrap will only show you the last outputted xml.

in xbmc the actors will be pushed to a list for each returned xml

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
smuto Offline
Senior Member
Posts: 241
Joined: Sep 2004
Reputation: 2
Post: #23
thx a lot

filmweb.xml with actor's thumb - 100% working

but it becomes extremly slow - sometimes to collect url of thumbs, scraper visits more then 20 pages

so if someone want use it - just grab it from here

one more question

i edit TheTVDB.com scraper to match at first polish strings
tvdb-pl.xml
try to set encoding to ISO-8859-2 in scraper, but without success

A gui charset in langinfo.xml
<charsets>
<gui unicodefont="false">CP1250</gui>
<subtitle>CP1250</subtitle>
</charsets>

polish xbmc language strings are in "utf-8"
polish subtitle are mostly in CP1250

when i change gui charset to
<gui >ISO-8859-2</gui>
tvdb-pl scraper works perfect

What gui charset is for?
smuto

[Image: 1.png]
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #24
if returned xml is not utf8, it will be assumed to be gui charset and is converted from that to utf8.
if this is the best behaviour? not sure

as for the scraper being slow - not much we can do about that as long as the site is organized as it is...

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
(This post was last modified: 2007-12-11 15:27 by spiff.)
find quote
smuto Offline
Senior Member
Posts: 241
Joined: Sep 2004
Reputation: 2
Post: #25
update - add scraper settings

one week for tests before we add this to SVN

filmweb.xml- with settings



have problem with encodings labels in scraper file
[Image: scraper_settings.jpg]
is the way to add "Automatically grab actor thumbs" set to scraper settings window?

smuto

[Image: 1.png]
find quote
C-Quel Offline
Retired Team-XBMC Member
Posts: 1,378
Joined: Aug 2004
Reputation: 0
Post: #26
Just add a setting to the xml label="Auto Grab Actor Thumbs" id="autograb" type="bool" default="false"

duplicate your ActorLink have one input with conditional="autograb" (with thumb)

and the copy of ActorLink conditional="!autograb" but do not output <thumb></thumb>

EDIT: line 104, pos 239 change &nbsp to &amp;nbsp;

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]

If scraper related please always grab the latest XML relevant to the content you are trying to grab info for from this link https://xbmc.svn.sourceforge.net/svnroot...m/scrapers

System Specs:

A Computer with loads of shiny things that make a noise and bring life to my tv, and xbmc ofc :)

iNerd Store

iNerd Forum
(This post was last modified: 2008-01-19 13:52 by C-Quel.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #27
i dont think it fits as a scraper setting. you see, if you do it in the scraper it means you won't return the urls at all. the global setting is whether or not to actually grab the thumbs, not whether or not to grab the urls. small but important difference here - if you disable it at scraper level it means you cannot grab them manually either... hence dual settings makes sense to me

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
smuto Offline
Senior Member
Posts: 241
Joined: Sep 2004
Reputation: 2
Post: #28
i try to update the <nfourl>
can someone help me

for now i only use link with id
http://www.filmweb.pl/Film?id=999999

i try to add link with movie title to <nfourl>
http://movie.title.filmweb.pl/

this is my wip
PHP Code:
    <NfoUrl dest="3">
        <
RegExp input="$$1" output="http://www.filmweb.pl/Film?id=\1"  dest="3">
            <
expression noclean="1">Film.id=([0-9]*)</expression>
        </
RegExp>
                <
RegExp input="$$1" output="http://\1.filmweb.pl"  dest="3+">
            <
expression noclean="1">http://([^\/]+).filmweb.pl</expression>
        
</RegExp>
    </
NfoUrl

but movie title regexp work for both url
how can i force scraper to use id, if it's present

[Image: 1.png]
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #29
easiest solution (i dont have time to analyze the regexp's).

output xml, i.e. <url>theurl</url>

first url block will take priority

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
smuto Offline
Senior Member
Posts: 241
Joined: Sep 2004
Reputation: 2
Post: #30
thx a lot - it's working

add as a patch to SVN

[Image: 1.png]
find quote
Post Reply