Quick Scraper Question (Hope so:))
#46
spiff Wrote:any result from scrap.exe is irrelevant

Just tried it without in XBMC and it doesn't work, if i add the line, it'll work Nod, don't know why.
Reply
#47
Here we go again with stealing your time:

You know how i get the covers from cinefacts, because you set it up.

My question now: is it possible to add covers from a different site and get them all together from both sitesHuh

thanks for get me learnig Smile
Reply
#48
yes, have a look at the imdb scraper for instance. or how you add the fanart for that matter....
Reply
#49
spiff Wrote:yes, have a look at the imdb scraper for instance. or how you add the fanart for that matter....


That's freaking me out, here's what i have so far:

Code:
<GetPosterLinkURL dest="5">
        <RegExp input="$$2" output="&lt;details&gt;\1&lt;/details&gt;" dest="5+">
                        <RegExp input="$$1" output="&lt;url function=&quot;GetPosterURL&quot;&gt;http://www.moviemaze.de/filme/\1/\2&lt;/url&gt;" dest="2+">
                <expression>&lt;a href=&quot;/filme/([0-9]+)/([^&quot;]*)&quot;</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetPosterLinkURL>

    <GetPosterURL dest="5">
        <RegExp conditional="poster" input="$$1" output="&lt;details&gt;&lt;url function=&quot;GetPoster&quot;&gt;http://www.moviemaze.de/media/poster/\1/\2&lt;/url&gt;&lt;/details&gt;" dest="5">
            <expression>&lt;a href=&quot;/media/poster/([0-9]+)/([^&quot;]*)&quot;</expression>
        </RegExp>
    </GetPosterURL>

    <GetPoster dest="5">
        <RegExp input="$$2" output="&lt;details&gt;&lt;poster url=&quot;http://www.moviemaze.de/filme/\1/poster_lg\2.jpg&lt;/poster&gt;&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="&lt;thumb&gt;http://www.moviemaze.de/filme/\1/poster_lg\2.jpg&lt;/thumb&gt;" dest="2">
                <expression repeat="yes">/([0-9]+)/poster([0-9]+)</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetPoster>

There must be something wrong in the last section GetPoster, but as always i can't find the way.

Thanks
Schenk
Reply
#50
there is no <poster> tag, where did you get that idea from?
Reply
#51
spiff Wrote:there is no <poster> tag, where did you get that idea from?

thought it's free. just replaced fanart.No
Reply
#52
well, all the time spent the last days struggling with <thumbs> and <thumb> should make it clear how you add thumbs to the result
Reply
#53
Hey spiff,

after struggling the whole night and day, i just made it Smile this now works and at the end it was easier then i thought the whole day!!! But i'm not finished disturbing you: please let me know if you could help me here.

Code:
<RegExp input="$$1" output="&lt;url function=&quot;GetPosterLinkURL&quot;&gt;http://www.moviemaze.de/suche/result.phtml?searchword=\1&lt;/url&gt;" dest="5+">

Prob here is that it will onyl search for the first word.
for example: Das Hundehotel, it only search for Die, same for The Last..., it only search for The. Any way to change thisHuh

Thanks again

Schenk
Reply
#54
you need to run a replacement regexp, replacing ' ' with %20. something along this;

(grab the relevant title in, e.g. $$5)
Code:
<RegExp input="$$5" output="\1%20\2" dest="7">
  <expression repeat="yes">(.*?) (.*)</expression>
</RegExp>
Reply
#55
spiff Wrote:you need to run a replacement regexp, replacing ' ' with %20. something along this;

(grab the relevant title in, e.g. $$5)
Code:
<RegExp input="$$5" output="\1%20\2" dest="7">
  <expression repeat="yes">(.*?) (.*)</expression>
</RegExp>

As usual, i don't understand where to make this in here. is it an inner or outer or seperate regexp?

Code:
            <!--Moviemaze Poster URL-->
                        <RegExp input="$$1" output="&lt;url function=&quot;GetPosterLinkURL&quot;&gt;http://www.moviemaze.de/suche/result.phtml?searchword=\1&lt;/url&gt;" dest="5+">
                <expression noclean="1">&lt;h1&gt;([^&lt;]*)</expression>
            </RegExp>
Reply
#56
1) grab whatever you want to search for into a buffer (as i already stated).
Code:
<RegExp input="$$1" output="\1" dest="6">
  <expression noclean="1">&lt;h1&gt;([^&lt;]*)</expression>
</RegExp>

2. run the replacement regexp
Code:
<RegExp input="$$6" output="\1%20\2" dest="7">
  <expression repeat="yes">(.*?) (.*)</expression>
</RegExp>

3. finally construct the url based on your new and shiny space-replaced title
Code:
<RegExp input="$$7" output="&lt;url function=&quot;GetPosterLinkURL&quot;gt;http://www.moviemaze.de/suche/result.phtml?searchword=\1&lt;/url&gt;" dest="5+">
  <expression noclean="1"/>
</RegExp>
Reply
#57
spiff Wrote:
Code:
<RegExp input="$$1" output="\1" dest="6">
  <expression noclean="1">&lt;h1&gt;([^&lt;]*)</expression>
</RegExp>

Code:
<RegExp input="$$6" output="\1%20\2" dest="7">
  <expression repeat="yes">(.*?) (.*)</expression>
</RegExp>

Code:
<RegExp input="$$7" output="&lt;url function=&quot;GetPosterLinkURL&quot;gt;http://www.moviemaze.de/suche/result.phtml?searchword=\1&lt;/url&gt;" dest="5+">
  <expression noclean="1"/>
</RegExp>

Thanks again spiff, i now understand and got it working but now i think it only search for e.g. Der letzte and not Der letzte Zug. Maybe i have to change the expression, but don't know what the old expression is doing for now (.*?) (.*)

-Schenk
Reply
#58
well, you should know what that does - it is just a regular expression.

that being said; my bad. you want
Code:
<RegExp input="$$6" output="\1%20" dest="7">
  <expression repeat="yes">([^ ]+)</expression>
</RegExp>
Reply
#59
@Schenk2302:
Als ich den moviemaze.de scraper geschrieben habe und mich dadurch das erste mal mit RegEx auseinandersetzen musste, hat mir diese Seite weitergeholfen.
http://www.regex-tester.de/regex.html
Reply
#60
w00dst0ck Wrote:@Schenk2302:
Als ich den moviemaze.de scraper geschrieben habe und mich dadurch das erste mal mit RegEx auseinandersetzen musste, hat mir diese Seite weitergeholfen.
http://www.regex-tester.de/regex.html

Hi woodstock,

ja, Danke Dir, habe dort auch schon geschaut, nur manchmal fällt der Groschen einfach nicht.

Grüße

Schenk
Reply

Logout Mark Read Team Forum Stats Members Help
Quick Scraper Question (Hope so:))0