[RELEASE] FilmAffinity (Spanish) scraper

  Thread Rating:
  • 2 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #46
updated scraper is now in svn, r15969
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #47
oh, and the search string encoding worked fine for me. i made a directory named cariño, set content, did the lookup. got the list your url pointed to.
find quote
fidoboy Offline
Fan
Posts: 404
Joined: Oct 2008
Reputation: 0
Post: #48
And where is the SVN? can you provide a link to download the scraper or attach it here?

regards,

Fido
find quote
w00dst0ck Offline
Junior Member
Posts: 37
Joined: Aug 2008
Reputation: 0
Location: Germany
Post: #49
SVN: https://xbmc.svn.sourceforge.net/svnroot...ers/video/


@HectorziN:
It is possible to get the IMDB Link with a google search.
site:imdb.com +original title +year

I'm using a google wrapper to get the IMDB ID for fanart at my moviemaze scraper.

Code:
<!--URL to Google and Fanart-->
<RegExp conditional="fanart" input="$$8" output="&lt;url function=&quot;GoogleToIMDB&quot;&gt;http://www.google.com/search?q=site:imdb.com+moviemaze\1&lt;/url&gt;" dest="5+">
<RegExp input="$$1" output="\1" dest="7">
    <expression>&lt;h2&gt;\((.*)\)&lt;</expression>
</RegExp>
<RegExp input="$$7" output="+\1" dest="8+">
    <expression repeat="yes">([^ ,]+)</expression>
</RegExp>
<expression></expression>
</RegExp>

<!--GoogleToIMDB-->
<GoogleToIMDB dest="5">
<RegExp input="$$2" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;&gt;&lt;details&gt;\1&lt;/details&gt;" dest="5">
<RegExp input="$$1" output="&lt;url function=&quot;GetFanart&quot;&gt;http://api.themoviedb.org/backdrop.php?imdb=\1&lt;/url&gt;" dest="2+">
<expression>/title/([t0-9]*)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GoogleToIMDB>

<!-- Fanart -->
<GetFanart dest="5">
<RegExp input="$$2" output="&lt;details&gt;&lt;fanart url=&quot;http://themoviedb.org/image/backdrops&quot;&gt;\1&lt;/fanart&gt;&lt;/details&gt;" dest="5">
<RegExp input="$$1" output="&lt;thumb preview=&quot;/\1/\2_poster.jpg&quot;&gt;/\1/\2.jpg&lt;/thumb&gt;" dest="2">
<expression repeat="yes">/([0-9]*)/([t0-9-]*).jpg&lt;/URL</expression>
</RegExp>
<expression noclean="1">(.+)</expression>
</RegExp>
</GetFanart>
(This post was last modified: 2008-10-22 10:15 by w00dst0ck.)
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #50
w00dst0ck Wrote:SVN: https://xbmc.svn.sourceforge.net/svnroot...ers/video/


@HectorziN:
It is possible to get the IMDB Link with a google search.
site:imdb.com +original title +year

I'm using a google wrapper to get the IMDB ID for fanart at my moviemaze scraper.

Code:
<!--URL to Google and Fanart-->
<RegExp conditional="fanart" input="$$8" output="&lt;url function=&quot;GoogleToIMDB&quot;&gt;http://www.google.com/search?q=site:imdb.com+moviemaze\1&lt;/url&gt;" dest="5+">
<RegExp input="$$1" output="\1" dest="7">
    <expression>&lt;h2&gt;\((.*)\)&lt;</expression>
</RegExp>
<RegExp input="$$7" output="+\1" dest="8+">
    <expression repeat="yes">([^ ,]+)</expression>
</RegExp>
<expression></expression>
</RegExp>

<!--GoogleToIMDB-->
<GoogleToIMDB dest="5">
<RegExp input="$$2" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;&gt;&lt;details&gt;\1&lt;/details&gt;" dest="5">
<RegExp input="$$1" output="&lt;url function=&quot;GetFanart&quot;&gt;http://api.themoviedb.org/backdrop.php?imdb=\1&lt;/url&gt;" dest="2+">
<expression>/title/([t0-9]*)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GoogleToIMDB>

<!-- Fanart -->
<GetFanart dest="5">
<RegExp input="$$2" output="&lt;details&gt;&lt;fanart url=&quot;http://themoviedb.org/image/backdrops&quot;&gt;\1&lt;/fanart&gt;&lt;/details&gt;" dest="5">
<RegExp input="$$1" output="&lt;thumb preview=&quot;/\1/\2_poster.jpg&quot;&gt;/\1/\2.jpg&lt;/thumb&gt;" dest="2">
<expression repeat="yes">/([0-9]*)/([t0-9-]*).jpg&lt;/URL</expression>
</RegExp>
<expression noclean="1">(.+)</expression>
</RegExp>
</GetFanart>

Thanks! it is a great idea but.... always returns the same movie? it could return a wrong one, right?

HectorziN
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #51
spiff Wrote:oh, and the search string encoding worked fine for me. i made a directory named cariño, set content, did the lookup. got the list your url pointed to.

Not a directory, the movie must be called cariño or another movie with a tittle containing ñ

If you search for a movie with the ñ character the scraper cannot find it because the encoding. Using the web browser in filmaffinity.com, it works.

Couls you test it, and... do yoy know the value for searchstringencoding that I need to use?

many thanks!

HectorziN
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #52
Confused

i repeat;
i made a directory named cariño, set content (including scan by dir name obviously), did the lookup. got the list your url pointed to.
find quote
fidoboy Offline
Fan
Posts: 404
Joined: Oct 2008
Reputation: 0
Post: #53
Hi,

The encoding for ñ char is: %F1 but, anyway here you have the complete list (accents, etc):

http://www.jairoblanco.com/guia-rapida/h...ificacion/

greets,
find quote
w00dst0ck Offline
Junior Member
Posts: 37
Joined: Aug 2008
Reputation: 0
Location: Germany
Post: #54
HectorziN Wrote:Thanks! it is a great idea but.... always returns the same movie? it could return a wrong one, right?

I've included moviemaze in my search string. If it's listed in the external review list of imdb.com [example] I'll be sure that's the same movie.
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #55
spiff Wrote:Confused

i repeat;
i made a directory named cariño, set content (including scan by dir name obviously), did the lookup. got the list your url pointed to.

OK, but the problem I have is this one:
A folder called Movies
In this folder a lot of movies
one of them called "Cariño estoy hecho un perro"
I search information for this movie using the filmaffinity scrapper
and no results found, I change Cariño with Carino and it works.

The problem is that the search is not done with iso encoding, and I don't know the value to set in searchstringencoding

HectorziN
find quote
fidoboy Offline
Fan
Posts: 404
Joined: Oct 2008
Reputation: 0
Post: #56
Hectorzin, have you readed my answer? You must encode your string, you should replace "cariño" with "cari%F1o" in your URL...

regards,
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #57
that will be done by the URL encoding applied prior to passing the argument to the scraper function...
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #58
My scraper is a lot complex. Is there any application to help debugger it?
I want to include impawards posters and I can't get it.

Thanks

HectorziN
find quote
w00dst0ck Offline
Junior Member
Posts: 37
Joined: Aug 2008
Reputation: 0
Location: Germany
Post: #59
I use xbmc for windows and watch the xbmc.log

There are also some online RegEx testers.
find quote
HectorziN Offline
Senior Member
Posts: 107
Joined: Mar 2007
Reputation: 0
Location: Barcelona (Spain)
Post: #60
w00dst0ck Wrote:I use xbmc for windows and watch the xbmc.log

There are also some online RegEx testers.

Where the log file is stored in windows atlantis version?

thanks

HectorziN
find quote
Post Reply