Scraper not returning results
#1
I was finally able to get the scraper to create a search URL, but now I'm stuck at the next spot: getting results.

Here is a section of the log:
Code:
16:29:21 T:140634451166976   DEBUG: GetMovieId (/home/gagarin/Videos/Grindhouse/Emmanuelle.avi), query = select idMovie from movie where idFile=13
16:29:21 T:140634451166976   DEBUG: VideoInfoScanner: No NFO file found. Using title search for '/home/gagarin/Videos/Grindhouse/Emmanuelle.avi'
16:29:21 T:140634451166976   DEBUG: FindMovie: Searching for 'Emmanuelle' using Grindhouse Database scraper (path: '/home/gagarin/.xbmc/addons/metadata.grindhousedatabase.com', content: 'movies', version: '0.0.2')
16:29:21 T:140634451166976   DEBUG: scraper: CreateSearchUrl returned <url>http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&amp;fulltext=Search</url>
16:29:21 T:140634451166976   DEBUG: CurlFile::Open(0x7fe7f4027e50) http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&fulltext=Search
16:29:22 T:140635504355200    INFO: LIRC Initialize: using: /dev/lircd
16:29:22 T:140635504355200   DEBUG: Failed to connect to LIRC. Giving up.
16:29:22 T:140634451166976   DEBUG: scraper: GetSearchResults returned <results></results>
16:29:22 T:140634451166976   DEBUG: FindMovie: Searching for 'Emmanuelle' using Grindhouse Database scraper (path: '/home/gagarin/.xbmc/addons/metadata.grindhousedatabase.com', content: 'movies', version: '0.0.2')
16:29:22 T:140634451166976   DEBUG: scraper: CreateSearchUrl returned <url>http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&amp;fulltext=Search</url>
16:29:22 T:140634451166976   DEBUG: CurlFile::Open(0x7fe7f4027e50) http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&fulltext=Search
16:29:23 T:140635504355200   DEBUG: ------ Window Deinit (Pointer.xml) ------
16:29:24 T:140634451166976   DEBUG: scraper: GetSearchResults returned <results></results>
16:29:24 T:140634451166976 WARNING: No information found for item '/home/gagarin/Videos/Grindhouse/Emmanuelle.avi', it won't be added to the library.

And here is the scraper:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<scraper date="2013-06-17" framework="1.1">

    <NfoUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://www.grindhousedatabase.com/index.php/\1&lt;/url&gt;" dest="3">
            <expression noclean="1">grindhousedatabase.com/index.php/([a-zA-Z0-9\s\p{P}]*)</expression>
        </RegExp>
    </NfoUrl>

    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://www.grindhousedatabase.com/index.php/Special:Search?search=\1&amp;amp;fulltext=Search&lt;/url&gt;" dest="3">
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>

    <GetSearchResults dest="8">
        <RegExp input="$$3" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.grindhousedatabase.com/index.php/\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;div class=&apos;mw-search-result-heading&apos;&gt;&lt;a href=&quot;/index.php/([^&quot;]*)&quot; title=&quot;([^&quot;]*)&quot;</expression>
            </RegExp>
            <expression clear="yes" noclean="1" />
        </RegExp>
    </GetSearchResults>

    <GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">

        <!-- TITLE -->
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="5">
                <expression>&lt;h1 class=&quot;firstHeading&quot;&gt;([^&lt;]*)</expression>
            </RegExp>

        <!-- YEAR -->
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="5+">
                <expression>&lt;a href=&quot;/index.php/Category:([0-9]{4})</expression>
            </RegExp>

        <!-- DIRECTOR -->
            <RegExp input="$$1" output="&lt;director&gt;\1&lt;/director&gt;" dest="5+">
                <expression>Directed by ([a-z,A-Z, ]*$)</expression>
            </RegExp>

        <!-- TOP250 -->
            <!-- Grindhouse Database Top 20 -->

        <!-- MPAA -->
            <!-- GHDB doesn't really do this, since most will have multiple ratings -->

        <!-- TAGLINE -->

        <!-- RUNTIME -->
            <RegExp input="$$1" output="&lt;runtime&gt;\1&lt;/runtime&gt;" dest="5+">
                <expression>Running Time: ([0-9]{2,3}) min</expression>
            </RegExp>

        <!-- THUMB-->
            <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="5+">
                <expression>src=&quot;/images/thumb/([a-zA-Z0-9\s\p{P}]*)&quot; width</expression>
            </RegExp>

        <!-- CREDITS -->
            <!-- GHDB doesn't do full credits -->

        <!-- RATING -->
            <!-- GHDB doesn't do this -->

        <!-- VOTES -->
            <!-- GHDB doesn't do this -->

        <!-- GENRE -->
            <!-- Use GHDB categories, excluding year released -->
            <RegExp input="$$1" output="&lt;genre&gt;\1&lt;/genre&gt;" dest="5+">
                <expression>&lt;li&gt;&lt;a href=&quot;/index.php/Category:([a-zA-Z0-9\s\p{P}]*)&quot; title</expression>
            </RegExp>

        <!-- ACTOR -->
    
            <!-- NAME -->

            <!-- ROLE -->
                <!-- GHDB doesn't do this -->

        <!-- OUTLINE -->
            <!-- GHDB doesn't do this -->

        <!-- PLOT -->
            <!-- GHDB doesn't do this -->

            <expression clear="yes" noclean="1" />
        </RegExp>
    </GetDetails>
</scraper>

I've been messing with the regex in GetSearchResults, but keep getting the same results: none! I've been assuming it's the regex, but I'm not positive. The .xml file opens in Firefox with no problems, so it shouldn't be an XML error. The film is listed at the website, so it's not a matter of having nothing to return.

Any suggestions?
Reply
#2
Try replacing the &apos;s in the expression with actual apostrophes.
Reply
#3
There is a small mistake.
The output of one function doesn't the next input within GetsearchResults
I changed:

Code:
<GetSearchResults dest="8">
        <RegExp input="$$3" ...

to:

Code:
<GetSearchResults dest="8">
        <RegExp input="$$5" ...


Then the function worked fine.
Reply
#4
flobbes - I had tried using $$5 originally, but wasn't getting any results, so I changed it to $$3 based on something I saw in a different scraper. However, using that in conjunction with scudlee's suggestion on &apos; worked, and the scraper is now getting results. Now I'll have to work out the GetDetails section, but that's for another day.

Here is the current GetSearchResults section:
Code:
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.grindhousedatabase.com/index.php/\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;div class='mw-search-result-heading'&gt;&lt;a href=&quot;/index.php/([^&quot;]*)&quot; title=&quot;([^&quot;]*)&quot;</expression>
            </RegExp>
            <expression clear="yes" noclean="1" />
        </RegExp>
    </GetSearchResults>

Now that I look at the code closer, I see (I think) why $$3 wasn't working. If I had changed the third line to end with dest="3" it might have worked. That's what I get for not using text wrapping in my text editor. Since it was at the end of the line. I missed that part.
Reply

Logout Mark Read Team Forum Stats Members Help
Scraper not returning results0