[RELEASE] FilmAffinity (Spanish) scraper

  Thread Rating:
  • 2 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
w00dst0ck Offline
Junior Member
Posts: 37
Joined: Aug 2008
Reputation: 0
Location: Germany
Post: #61
Should be %HOMEPATH%\Application Data\XBMC\xbmc.log
or C:\Program Files\XBMC\xbmc.log

BTW: The german website http://www.regex-tester.de/regex.html translated to spanish http://tinyurl.com/5gfxx9 helps alot.
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #62
One question

I have some functions:
Code:
    <GetMoviePosterDB clearbuffers="no" dest="12">
        <RegExp input="$$1" output="&lt;thumb&gt;\1l_\2&lt;/thumb&gt;" dest="13+">
                <expression clear="yes" repeat="yes" noclean="1,2">&quot;poster&quot;.*?src=&quot;(.*?)[a-z]_(.*?)&quot;</expression>
            </RegExp>
    </GetMoviePosterDB>


    <GetIMDBPoster dest="5">
        <RegExp input="$$16$$17$$13$$15$$18" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="5">
        <RegExp input="$$6" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="15">
            <RegExp input="$$1" output="\1_SX$INFO[imdbscale]_SY$INFO[imdbscale]_\2" dest="6">
                <expression noclean="1,2">&lt;a name=&quot;poster&quot;.*?src=&quot;(.*?)_S.*?(.jpg)&quot;.*?&lt;/a&gt;</expression>
            </RegExp>
            <expression clear="yes" noclean="1">(.*?_SX[0-9]+_SY[0-9]+_.jpg)</expression>
        </RegExp>
        <expression noclean="1"></expression>
        </RegExp>
    </GetIMDBPoster>

First function has a dest="13" inside
Second function has a dest="15" inside

As input of second function are $$13 and $$15

This works, and I have as a result the list of thumbs from both pages.

The problem I have is the next

if I insert this code in the <GetDetails> section:

Code:
          <RegExp input="$$1" output="&lt;url function=&quot;GetFilmAffinityPoster&quot;&gt;http://www.filmaffinity.com/es/film214384.html&lt;/url&gt;" dest="16">
              <expression noclean="1"></expression>
          </RegExp>

you can see $$16 is also an input of the GetIMDBPoster function, but it don't work. Why?

Thanks

HectorziN
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #63
my suspicion is that you are bitten by the fact that order matters.

remember, the clearbuffers parameter say to not clear buffers after function calls.
so the point here is to
1) fill buffer 13
2) fill buffer 15
3) fill buffer 16
4) call GetIMDBPoster

not
1) fill buffer 13
2) fill buffer 15
3) call GetIMDBPoster
4) fill buffer 16

and i assume GetFilmAffinityPoster has clearbuffers="no"
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #64
Thanks, I got it. The problem was the order of functions.

I am testing with PC version and it works searching for cariƱo.
Is in xbox where don't work. It is possible that SearchStringEncoding is not implemented in xbox version?

thanks

HectorziN
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #65
I am searching for actor thumbs in imdb.
I have a problem with animation movies, in filmaffinity site de actor for this case is: "Animation"

Then my scraper searchs for animation in imdb and always finds this: Chuck Jones.

How can I avoid searching for actors when the actor is Animation? I should try to remove the Animation code from the buffer to avoid the scraper find it....

Thanks

HectorziN
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #66
use an expression that clears a buffer IF you find animation. let this buffer hold the function call. clear if expression matches. append the buffer. problem solved.
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #67
spiff Wrote:use an expression that clears a buffer IF you find animation. let this buffer hold the function call. clear if expression matches. append the buffer. problem solved.

I have tried it but....
If I write this:
Code:
        <RegExp conditional="SearchCastThumb" input="$$1" output="&lt;url function=&quot;SearchCastThumb&quot;&gt;http://spanish.imdb.com/find?s=nm&amp;amp;q=\1&lt;/url&gt;" dest="5+">
            <expression repeat="yes" noclean="1" trim="1">&lt;a href="search\.php.stype=cast.stext=([^&quot;]*)[^&gt;]*&gt;([^&lt;]*)</expression>
        </RegExp>

it works, but if I change dest="5+" with dest="9+" then the scraper don't call the function. I know because if dest="5+" the log has this:

Get URL: http://spanish.imdb.com/find?s=nm&q=Mar%...E9+Baus%E1

but with 9+ it isn't

why?

thanks

HectorziN
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #68
i assume buffer 9 is never transfered to the one containing the return value from the scraper function..
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #69
spiff Wrote:i assume buffer 9 is never transfered to the one containing the return value from the scraper function..

Yes, I use this:

Code:
      <RegExp input="$$9" output="\1" dest="5+">
         <expression></expression>
      </RegExp>

I tried this code after the call to function and also with the function inside it:

Code:
      <RegExp input="$$9" output="\1" dest="5+">
        <RegExp conditional="SearchCastThumb" input="$$1" output="&lt;url function=&quot;SearchCastThumb&quot;&gt;http://spanish.imdb.com/find?s=nm&amp;amp;q=\1&lt;/url&gt;" dest="9+">
            <expression repeat="yes" noclean="1" trim="1">&lt;a href="search\.php.stype=cast.stext=([^&quot;]*)[^&gt;]*&gt;([^&lt;]*)</expression>
        </RegExp>
         <expression></expression>
      </RegExp>


I also tested it with buffer 20 because buffer 9 is used in the scraper and buffer 20 is never used. But don't work in any case.

Thanks

HectorziN
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #70
you need noclean on the outermost expression or all tags will be stripped off
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #71
spiff Wrote:you need noclean on the outermost expression or all tags will be stripped off

Thanks! Solved!

HectorziN
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #72
By the way, with Atlantis version searchstringencoding works!!!!

HectorziN
find quote
fidoboy Offline
Fan
Posts: 404
Joined: Oct 2008
Reputation: 0
Post: #73
Is not working on current version (9.04 beta)
find quote
dedaluz Offline
Member
Posts: 57
Joined: Mar 2009
Reputation: 0
Post: #74
Any updates on this one?
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #75
yes, i commited a fix at r19978
find quote
Post Reply