Variable in <expression> section?
#1
hi,

My challenge is that I have no way to end up on a single movie page. If I'm searching for movie A it will have that one and all its sequels - which would be ok for the GetSearchResults. However those URLs would all link to one and the same detail page for the initial movie and all sequels (I hope this still makes sense)

So I can populate the GetSearchResults and that gives a link to the 'detail page' for every movie found but when I want to use that URL I potentially would get more than one movie.

I think this would be easier to fix if I could use a variable in my <expression>Regex comes here</expression> bit which would then look like <expression>Regex with Movietitle comes here</expression>

Problem is that I have no clue on how to sneak in the Movietitle - if it is possible at all?

Jan
Reply
#2
that's easy. just stick the movie title in a buffer then adresse it with $$<#buffer>. the e.g. $$1 works everywhere, in output, input, expressions
Reply
#3
spiff Wrote:that's easy. just stick the movie title in a buffer then adresse it with $$<#buffer>. the e.g. $$1 works everywhere, in output, input, expressions

Right - that makes me feel a bit silly but I still can't figure it out. The CreateSearchUrl has the movie title in $$1 and sends the webpage to $$3 - for testing I'm simply using the exported xml movie database from XBMC (so I have more movies on one page).

Then the GetSearchResults has all the titles in it so I would need the title to restrict the overview list.

I threw in a couple of lines to send the movie title to buffer 2 (because I have it when creating the search url) but I'm not able to find it back lower down when reading the results.

If I replace the underlined .*? with $$2 in the GetSearchResults section it does not work although when I hardcode a movie title in there I get that title in the overview rather than the full list so that looks promising. So where is my $$2 gone?

Code:
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="http://smart-pvr/movie.xml" dest="3">
            <RegExp input="$$1" output="\1" dest="2">
                <expression noclean="1"/>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;&apos;\2&apos; van \4 (\3)&lt;/title&gt;&lt;url&gt;http://smart-pvr/movie.xml&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;(movie)&gt;.*?&lt;title&gt;([u][b].*?[/b][/u])&lt;/title&gt;.*?&lt;year&gt;(.*?)&lt;/year&gt;.*?&lt;director&gt;(.*?)&lt;/director&gt;</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetSearchResults>
Reply
#4
so you want the movie title in getsearchresults if i understand you correctly?

by default the scraper parser clears buffers between function calls. use the clearbuffers="no" parameter to override this behaviour. be warned though; getsearchresults puts the url in buffer 2 so you will have to use another one
Reply
#5
Great - that did the trick!

One final issue though - when the title is '101 Dalmatians' it is NOT passing on '101 Dalmatians' but '101%20dalmatians' instead. I tried the noclean option but that is not working.

When I have a movie in my webpage with the '101%20dalmatians' then it all works so I'm quite happy with that so far.

Any pointers as to where I can find the untouched title? When the script fails I get a popup with the exact title so the system has to know somehow?

btw - thanks a lot for your help - I needed to be put on the right track or I would never have made it so far.

Jan
Reply
#6
not much to do about that i'm afraid. since the search is usually done using url's we have to url encode it prior to passing it. if this really is a problem we can add an additional input that holds the non-encoded search title, just say jump
Reply
#7
hi,

Another cry for help - ok the scraper has the wrong name - originally I intended to integrate the SageTV library into XBMC but it seems I can't get the plot from the SageTV webserver.

Next idea was simply to throw the exported videodb.xml from XBMC on the webserver so searching for a title is not really possible as the webserver always simply returns the full .xml with all movies. That xml is quite easy to produce and why would we not consider the xbmc format of the xml as the reference format?

So when the CreateSearchUrl performs the call the webserver gives the full .xml - the regex then filters out that one title because before the <title> tag I added a <cleantitle> tag and that works nicely (we would not have to do this if we got the exact title but let's not go there yet).

What I absolutely don't understand is why I lose the content of $$6 in the GetDetails regardless whether I use <GetDetails clearbuffers="no" dest="3"> or <GetDetails dest="3"> (I thought the clearbuffers affected the bit that came after the module) At that point $$6 is empty so the regex matches the first title in the .xml - just when I thought I was there Sad

The output bit for the getdetails is really easy because we simply copy the content of the .xml we get from the webserver.

Any hints? Many thanks!

p.s.: I could achieve this by simply importing the videodb but I've given up on that as I could not get the thumbnails in and this is good practice when I start looking into tv-show integration with SageTV.

Code:
<scraper name="SageTV" content="movies" thumb="SageTV.gif" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <NfoUrl dest="3">
        <RegExp input="$$1" output="http://smart-pvr:8080/thumbs/movie.xml" dest="3">
            <expression noclean="1"/>
        </RegExp>
    </NfoUrl>
    <CreateSearchUrl clearbuffers="no" dest="3">
        <RegExp input="$$1" output="http://smart-pvr:8080/thumbs/movie.xml" dest="3">
            <RegExp input="$$1" output="\1" dest="6">
                <expression/>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults clearbuffers="no" dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;&apos;\2&apos; van \4 (\3)&lt;/title&gt;&lt;url&gt;http://smart-pvr:8080/thumbs/movie.xml&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;(movie)&gt;.*?&lt;cleantitle&gt;($$6)&lt;/cleantitle&gt;.*?&lt;year&gt;(.*?)&lt;/year&gt;.*?&lt;director&gt;(.*?)&lt;/director&gt;</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="\1" dest="5">
                <expression trim="1" noclean="1">$$6&lt;/cleantitle&gt;(.*?)&lt;/movie&gt;</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetDetails>
</scraper>
Reply
#8
i do understand. r16922
Reply
#9
yep - that works nicely! (he said after downloading and installing linux, compiling xbmc and some more suffering with a drive that refused to read half of the cds I burned with the ubuntu image).

Thanks!
Reply

Logout Mark Read Team Forum Stats Members Help
Variable in <expression> section?0