multiple <url function="myFunction" > in single <fanart>
#1
hi,

i'm working right now for two almost full-feature scrapers, one for romanian http://www.cinemagia.ro site, and the second for the russian http://www.kinopoisk.ru

the problem is that for the second scraper, kinopoisk.ru, there are multiple pages with posters, fanart and wallpapers. There are no problems for the posters. For the fanart, i have a function which return something like

<fanart>
[INDENT]<thumb preview="..." >...</fanart>
...
</fanart>[/INDENT]

for all images from the first page. i have also a function which return only <thumb ... > ... </thumb> results for the rest of fanart pages, and i'm calling it from the function which returns fanart from first page.

In this case, my final results are something like:

<fanart>
[INDENT]<thumb preview="..." >...</fanart>
...
<url function="GetKPRUFanartPage" >...</function>
...
</fanart>[/INDENT]

I have to say also that i tested all functions i've made and they are working well, so that the problem isn't in functions, it's about how xbmc is parsing the final results. So when it's about poster pages, xbmc is doing very well, but when he finds a <url function="..."> inside <fanart>...</fanart> tags xbmc doesn't parse it.

i tried to find a solution in this forum, but i did not Sad.

so i need somebody's help to solve this problem, so that i could post my scrapers Smile.
Reply
#2
I solved this problem in my movieplayer-it-film.xml scraper using multiple urls in the "Get searchresults"
Code:
        <!-- Get Results for Movieplayer Fanarts -->
        <RegExp conditional="Fan" input="$$7" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results sorted=&quot;yes&quot;&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3\4&lt;/title&gt;&lt;url&gt;[color=brown]http://www.movieplayer.it/film/\1/\2/[/color]&lt;/url&gt;&lt;url&gt;[color=red]http://www.movieplayer.it/film/\1/\2/gallery-e-trailer/wallpaper/1/&lt;/url&gt;[/color]&lt;url&gt;[color=blue]http://www.movieplayer.it/film/\1/\2/gallery-e-trailer/promozionali/1/[/color]&lt;/url&gt;&lt;url&gt;[color=green]http://www.movieplayer.it/film/\1/\2/gallery-e-trailer/foto-di-scena/1/[/color]&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="7">
                <expression repeat="yes">&lt;a href=&quot;http://www\.movieplayer\.it/film/([0-9]+)/([^/]*)/[^&gt;]*&gt;([^\(]*)([^-]*)[^=]*&lt;/a&gt;</expression>
            </RegExp>
            <expression clear="yes" noclean="1" trim="1"/>
        </RegExp>


Then the Fanart part:

Code:
            <!-- Fanart By movieplayer-->
            <RegExp conditional="Fan" input="$$8" output="&lt;fanart url=&quot;http://images.movieplayer.it/&quot;&gt;\1&lt;/fanart&gt;" dest="13+">
                <RegExp input="$$2" output="&lt;thumb preview=&quot;\1_cropped.jpg&quot;&gt;\1.jpg&lt;/thumb&gt;" dest="8+">
                    <expression repeat="yes" noclean="1">&lt;a href="/gallery/[^"]+"&gt;&lt;img src="http://images.movieplayer.it/([^_]+)_cropped.jpg" alt="</expression>
                </RegExp>
                <RegExp input="$$3" output="&lt;thumb preview=&quot;\1_cropped.jpg&quot;&gt;\1.jpg&lt;/thumb&gt;" dest="8+">
                    <expression repeat="yes" noclean="1">&lt;a href="/gallery/[^"]+"&gt;&lt;img src="http://images.movieplayer.it/([^_]+)_cropped.jpg" alt="</expression>
                </RegExp>
                <RegExp input="$$4" output="&lt;thumb preview=&quot;\1_cropped.jpg&quot;&gt;\1.jpg&lt;/thumb&gt;" dest="8+">
                    <expression repeat="yes" noclean="1">&lt;a href="/gallery/[^"]+"&gt;&lt;img src="http://images.movieplayer.it/([^_]+)_cropped.jpg" alt="</expression>
                </RegExp>
                <expression noclean="1"></expression>
            </RegExp>

Hope this is what you are trying to achieve. Wink
Reply
#3
hello, KoTiX!

i really appreciate your help, but that's not what i needed Smile the problem is that i don't know from <GetSearchResults> how many pages with fanart images and wallpapers are there, if there are any... i don't know even from the movie main page how many pages are there with fanart and wallpapers. i could find out that only by accessing the first fanart / wallpapers page.

i have also a page apart for the trailer, another for tagline, multiple pages with posters, and i managed all very well without any problems, so i can help you with that, if you want. I mean i have functions for the trailer, for the tagline and for full movie cast, and for all poster pages, and each one loads his page and are working just fine.

with fanart and wallpapers pages, if my functions returns only <thumb> results, there are no problems. i mean that i have all fanart and all wallpapers (form all pages) but for XBMC it's like they are posters. The problem appears if i put the results within <fanart>...</fanart> tags.

I understood that <url function="myFunction" > is not parsed if it appears inside <fanart> tags. But if it appears directly inside <details>...</details> it's parsed without any problem.

anyway, thanx again for your help! Smile

p.s. if you want, i could help you with your scraper Wink
Reply
#4
samoletic Wrote:with fanart and wallpapers pages, if my functions returns only <thumb> results, there are no problems. i mean that i have all fanart and all wallpapers (form all pages) but for XBMC it's like they are posters. The problem appears if i put the results within <fanart>...</fanart> tags.

The problem is that Thumbs can be appended one after another even if they come from different functions, but this is cannot be done for fanart (it's an xb,mc limitation) for this reason i did that in my scraper.
I tryed first in the way you did calling different functions for each page but it didn't work.
BTW I know for sure that it work for thumbs because it was implemented in xbmc not much time ago, look here: http://forum.xbmc.org/showthread.php?tid=55491
Reply
#5
right, this is on my list of stupids, but haven't gotten around to it yet. until then; study
this example

Code:
<GetDetails dest="3" clearbuffers="no">
  ...
push
<url function="FanartGrabber">someurl</url>
to $$3 as many times you see fit. finally
push
<url function="FanartCollector">someotherurlpossiblyblank</url>
AFTER the grabbers - this needs to be run last.
</GetDetails>
<FanartGrabber dest="3" clearbuffers="no">
  <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="5+">
    <expression repeat="yes">grabtheurls</expression>
  </RegExp>
</FanartGrabber>
<FanartCollector dest="3">
<RegExp input="$$5" output="&lt;fanart&gt;\1&lt;/fanart&gt;" dest="3">
  <expression noclean="1"/>
</RegExp>
</FanartCollector>

here fanartgrabber collects the thumbs we need into $$5, while fanartcollector adds the necessary <fanart> around the tags in the end. the main thing i want to point out is the usage of clearbuffers="no". this means we DO not clear the buffers after the function is executed, allowing for passing data between scraper functions.
Reply
#6
Sorry but i don't understand, do you mean that in this way will already work? or do you have to make some changes to the xbmc code too?
Reply
#7
sorry for the ambiguity.

this currently works fine. the real fix i hinted at is to allow multiple <fanart> tags.
Reply
#8
spiff Wrote:sorry for the ambiguity.

this currently works fine. the real fix i hinted at is to allow multiple <fanart> tags.

well, thank you very much, spiff! Smile it's almost what i needed. but, only my opinion Smile - the thing with <... clearbuffers="no" > it's too ambiguos (srry for my english). would be much better with something like <SomeFunction takebuffers="3,5" > like in <expression > tags, where <SomeFunction> will remember only contents of buffers 3 and 5 from the point it was called. That would be easy for that people that wants to use contents of the buffers.

Of course, it might be a solution for my case too Smile and i'll try to make your way very soon, and i have another idea for my case which i'm thinking to do. But i think it would be much better with multiple <fanart> possible, more than that, would be fine to use something like

<fanart set="kinopoisk" >
[INDENT]...
</fanart>
<fanart set="imdb" >
...
</fanart>[/INDENT]

and to be possible to use sets (or call it whatever you want) from the xbmc. But that would be fine for the future, i guess Smile. The same thing would be really fine for the trailers too, to some structure like

<trailer>
[INDENT]<title>some title</title>
<thumb>url_of_the_trailer_thumb</thumb>
<url quality="low" format="flv" width="480" >url_of_the_trailer_flv_low_quality</url>
<url quality="medium" format="avi" width="640" >url_of_the_trailer_avi_medium_quality</url>
<url quality="high" format="mov" width="1280" >url_of_the_trailer_mov_high_quality</url>
</trailer>[/INDENT]

and to be possible (in the future) to select the trailer you want to see.

I was thinking about that because both sites i'm working to make scrapers (cinemagia.ro and kinopoisk.ru) offers multiple trailers/teasers or just movie samples with or without subtitles or translated and also with different qualities. But it would be fine for the future.

remaining in the <fanart> area, i was thinking if it's possible to make some changes to xbmc which will allow to parse functions from inside the <fanart> or, to say, from the <actor> tags. it really can make much easier to do scrapers. and that's not only for me. my point is that the making of scrapers must be as easier and logic as it can be Wink

P.S. thanx for your help too, spiff Wink
Reply

Logout Mark Read Team Forum Stats Members Help
multiple <url function="myFunction" > in single <fanart>0