need help on clearbuffers="no" usage
#1
Question 
I've been struggling for quite a while with the clearbuffers="no" option, and I haven't found any documentation nor worked example which could have helped me getting it right. I'm currently helping to develop the FilmAffinity scraper for XBMC, and the fact is that I have 3 buffers filled inside the GetDetails function ($$11, $$12 and $$13) which I would love to see is their content inside a custom function I call to get the IMDB id. the reason is that if I'm able to send those variables I'd be able to parse the function results more accurately. this would be the (summarized version of the) code:

Code:
<GetDetails dest="3" clearbuffers="no">
    <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
        <RegExp input="$$1" output="\1" dest="11">
            <expression trim="1" noclean="1">movie.gif&quot; border=&quot;0&quot;&gt; (.*?)(\(AKA|&lt;)</expression>
        </RegExp>
        <RegExp input="$$1" output="\1" dest="12">
            <expression trim="1">T&amp;Iacute\;TULO ORIGINAL&lt;/th&gt;\s*&lt;td&gt;&lt;strong&gt;(.*?)(&lt;|\(AKA)</expression>
        </RegExp>
        <RegExp input="$$1" output="\1" dest="13">
            <expression>A&Ntilde;O&lt;/th&gt;\s*&lt;td&gt;.*?\s*([0-9]{4})\s*&lt;/td&gt;</expression>
        </RegExp>
        <RegExp conditional="!EnableFastSearch" input="$$9" output="&lt;url function=&quot;GetIMDBid&quot;&gt;\1&lt;/url&gt;" dest="5+">
            <RegExp conditional="!GoogleAdvSearch" input="" output="http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=" dest="9">
                <expression />
            </RegExp>
            <RegExp conditional="GoogleAdvSearch" input="" output="http://www.google.com/search?q=site:imdb.com" dest="9">
                <expression />
            </RegExp>
            <RegExp input="$$12" output="+\1" dest="9+">
                <!-- unimos con '+' cada palabra -->
                <expression repeat="yes">(\w+)</expression>
            </RegExp>
            <RegExp input="$$13" output="+(\1)" dest="9+">
                <expression />
            </RegExp>
            <expression />
        </RegExp>

even though I'm setting the clearbuffers="no" option in the GetDetails function definition, I'm not seeing these $$12 and $$13 variables' content inside the GetIMDBid function. what should I modify to get it right? any help would be greatly appreciated.
Reply
#2
Do you have clearbuffers="no" on the GetIMDBid function? That's where you need it.
Reply
#3
(2012-12-07, 15:32)scudlee Wrote: Do you have clearbuffers="no" on the GetIMDBid function? That's where you need it.
thanks scudlee for your quick answer, but although I didn't know this for sure I had already desperately tried all possible combinations, which included using the option only the GetDetails, only in GetIMDBid, in both,... unfortunately all of them without success. I don't know if it could be a problem with Frodo having disabled this function, if I'm wrongly editing something,... let me share the "entire" working code with you in case you are able to see something I could be missing:
Code:
<GetDetails dest="3" clearbuffers="no">
    <!-- TITLE -->
    <RegExp input="$$1" output="\1" dest="11">
        <expression trim="1" noclean="1">movie.gif&quot; border=&quot;0&quot;&gt; (.*?)(\(AKA|&lt;)</expression>
    </RegExp>
    <!-- ORIGINAL TITLE -->
    <RegExp input="$$1" output="\1" dest="12">
        <expression trim="1">T&amp;Iacute\;TULO ORIGINAL&lt;/th&gt;\s*&lt;td&gt;&lt;strong&gt;(.*?)(&lt;|\(AKA)</expression>
    </RegExp>
    <!-- YEAR -->
    <RegExp input="$$1" output="\1" dest="13">
        <expression>A&Ntilde;O&lt;/th&gt;\s*&lt;td&gt;.*?\s*([0-9]{4})\s*&lt;/td&gt;</expression>
    </RegExp>
    <!-- obtención del IMDBid -->
    <RegExp input="$$9" output="&lt;url function=&quot;GetIMDBid&quot;&gt;\1&lt;/url&gt;" dest="5+">
        <RegExp input="" output="http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=" dest="9">
            <expression />
        </RegExp>
        <!-- USE ORIGINAL TITLE WORDS -->
        <RegExp input="$$12" output="+\1" dest="9+">
            <!-- unimos con '+' cada palabra -->
            <expression repeat="yes">([^ ,\(\)]+)</expression>
        </RegExp>
        <!-- USE YEAR -->
        <RegExp input="$$13" output="+(\1)" dest="9+">
            <expression />
        </RegExp>
        <expression />
    </RegExp>
    <!-- more lines of code, none of them writting onto buffer $$11, which -->
    <!-- approprately contains the movie title inside the GetDetails function -->
</GetDetails>
<GetIMDBid dest="3" clearbuffers="no">
    <RegExp input="$$11" output="&lt;details&gt;&lt;TEST&gt;\1&lt;/TEST&gt;&lt;/details&gt;" dest="3">
        <expression noclean="1" />
    </RegExp>
</GetIMDBid>
what I understand that it should happen is that I should see an output out of the GetIMDBid function like this one:
Code:
<details><TEST>titleofthescrapedmovie</TEST></details>
although what I get is this empty string:
Code:
<details><TEST></TEST></details>
Reply
#4
Start with the obvious question, have you verified that the $$11 buffer is being filled in GetDetails (i.e. the regexp works)? What happens if you add
Code:
<RegExp input="$$11" output="&lt;TEST&gt;\1&lt;/TEST&gt;" dest="5+">
        <expression noclean="1" />
</RegExp>
into GetDetails? Does it show up as expected?
Reply
#5
sure, I understand that you have to start from the very beginning, but yes, I'm sure that $$11 contains the title because indeed I use it inside the GetDetails block starting the $$5 buffer:
Code:
<RegExp input="" conditional="!EnableOriginalTitles" output="&lt;title&gt;$$11&lt;/title&gt;" dest="5">
    <expression />
</RegExp>
and indeed this code provides a nice "<title>titleofthescrapedmovie</title> at the debugging log.

feel free to have a look to the entire scraper, which I can't post here because it's limited to 10000 chars:
https://dl.dropbox.com/u/5361285/metadat....5.1b3.zip
this is the working scraper, but the idea would be to use buffers 11, 12 and 13 from GetDetails into GetIMDBid to refine the search results.
Reply
#6
I see the issue now, you're calling the GetIMDBid after another function has been called (one of the HDTrailersnet function), since that function doesn't have clearbuffers="no" on it, it's clearing the buffers before GetIMDBid is run.

Try moving the regexp up before the trailers, and adding clearbuffers="no" to both GetDetails and GetIMDBid (my testing seems to indicate that actually both are required.) (Edit: Actually, you were right all along - clearbuffers determines whether the buffers get cleared (or not) at the end of the function, not, as I thought, before the start of the function.)

My testing also suggests a flaw in your formation of the IMDB url, there's an extra + at the start of the search query:
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=+Paul+(2011)
doesn't work, but
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=Paul+(2011)
does.
I think this comes from the regexp starting on line 202, but I'm not entirely sure what problem that and the preceding one are trying to solve.
Reply
#7
first of all, thanks a lot for pointing out the precise error I was having. indeed, a previous function was clearing the buffers before I could access them! since I have no need at all to call the functions in a particular order, I have now moved the GetIMDBid function to be the first one and the clearbuffers="no" in GetDetails (yes, I was almost certain I had it right in my mind, although since it wasn't working I wasn't confident enough) it has worked fantastically. I will be able to refine the IMDB results now (it wouldn't be necessary if they were perfect, you know), as I'm able to select them by year or even by some title's words.

regarding that extra "+" sign, it's a very long story. the fact is that, depending on what title we are search, sometimes that "+" provides better results, and sometimes (as you've perfectly pointed it out) don't. as an example, if you're able to provide some further advice, see that the following code
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=Primos+%282011%29
does not provide the appropriate results, but if you use this one
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=+Primos+%282011%29
it works perfectly. in fact, the first option provides TV results only, and the second one provides movie and other results, although none of the previous, don't ask me why. I found out that this flaw, although not very systematic, was less problematic in my entire ~750 items library if using that "+" sign at the beginning of the query string rather than not using it. but I don't have Paul (2011), that's true. AND, I also came across uppercase and lowercase differences such as
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=The+Road+%282009%29
, that doesn't provide the results I'm looking for, but
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=the+Road+%282009%29
that indeed it does. I've implemented a very unexplainable piece of code that changes starting "T"s for "t"s, which helps a lot on my library parsing, but since that IMDB "kind of" API is not documented (I haven't found any information apart of the fact that this url system exists) I would love to understand why so subtle changes provide completely different set of results, not even sharing entries between them.

maybe this last paragraph would deserve a completely separated thread... anyway, thank you very very much for helping me to solve the clearbuffers="no" thing. it has been in my mind for a long time, and having it now right represents a major achievement for the scraper.
Reply
#8
Perhaps it would be better to not include the year in the search query, it doesn't seem to change much, except for the oddities.

Compare
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=+Primos+%282011%29
with
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=Primos

and
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=the+Road+%282009%29
with
Code:
http://www.imdb.com/xml/find?xml=1&nr=1&tt=on&q=The+Road

Obviously, you'll still need the year to make the best match from the results, but I'm guessing that was your intention anyway...
Reply
#9
uhm... that gives me a great idea though! since now I'm able to look for the year inside the results, I can now perform a more general query and dig later into the results. since I was going to do so anyway, if the query is nicer without the year I'll study the possibility since it's completely consistent with my initial intention.

again, thanks a lot for the advice. the way this thread has opened my eyes is just indescribable. the possibilities I currently face are far much better than the ones I first expected. I'm sure all FA's scraper users will definitely benefit from all these suggestions.

EDIT: I can confirm that the query without the year works much better! plus now that I'm able to search the returned results using things like the year or even the director's name, the success rate is extremely high! thanks a lot scudlee!
Reply

Logout Mark Read Team Forum Stats Members Help
need help on clearbuffers="no" usage0