Writing own movie scraper / Unable to connect to remote Server
#1
Hey there,

I'm completely new in writing movie scrapers so at first I had a look at the Writing media info scrapers guide from the wiki. I created my own scraper to collect information from dokujunkies.org. I could not find the scrap.exe (and it seems to be out of date too) so I tested my scraper with ScraperXML and everything seems to work correctly.

After that I tried several times to include my scraper into XBMC. I succeeded in that by creating a new Folder in ...ProgrammFiles/XBMC/addons/ wich I named "metadata.dokujunkies.org". I pastet the addon.xml from "metadata.themoviedb.org" in there and changed it so it refers to my scraper (also in the new created folder). After that I can select my scraper in XBMC while setting the content of a (movie)source. But when I try to fetch movie/documentation information from that source it says "Unable to connect to remote Server". In the debug log I get "ERROR: failed to load scraper XML".

All the original scrapers work fine and I also have the Youtube addon installed. My version of XBMC is Eden Beta2. I cannot find any further information that may help me neither in XBMC wiki nor in forums or google.

I hope someone of you can help me. If further information like debug logs or the scraper code is needed please just mention it.

Thanks

m0nk3y
Reply
#2
your scraper xml file is unparseable. open it in a web browser and it should tell you why.
Reply
#3
Hey spiff,

I opened the myScraper.xml with FF and IE and it is shown correctly in both. I get no error while opening. Perhabs the scraper code can help you to help me Smile

Code:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<scraper framework="1.1" date="2012-02-07" name="dokujunkies.org" content="movies" thumb="icon.png" language="de" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <NfoUrl dest="3">
        <RegExp input="$$1" output="\1" dest="3">
         <expression noclean="1">(http://www\.culturalianet\.com/art/ver\.php\?art=[0-9]*)</expression>
       </RegExp>
    </NfoUrl>
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="http://dokujunkies.org/search/\1" dest="3">
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://dokujunkies.org/dokus/\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;a href=&quot;http://dokujunkies.org/dokus/([^&quot;]*)&quot; rel=&quot;bookmark&quot; title=&quot;([^&quot;]*)&quot;&gt;</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$8" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="8">
                <expression trim="1" noclean="1">rel=&quot;bookmark&quot; title=&quot;([^&amp;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;&lt;/year&gt;&lt;director&gt;&lt;/director&gt;&lt;runtime&gt;\1&lt;/runtime&gt;" dest="8+">
                <expression trim="1" noclean="1">&lt;strong&gt;Dauer:&lt;/strong&gt;([\s0-9:]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;thumb&gt;&lt;url spoof=&quot;http://dokujunkies.org&quot;&gt;\1&lt;/url&gt;&lt;/thumb&gt;" dest="8+">
                <expression noclean="1">&lt;img src=&quot;([^&quot;]*)&quot; alt=&quot;Cover&quot; /&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;credits&gt;&lt;/credits&gt;&lt;genre&gt;&lt;/genre&gt;&lt;actor&gt;&lt;name&gt;&lt;/name&gt;&lt;role&gt;&lt;/role&gt;&lt;/actor&gt;&lt;plot&gt;\1&lt;/plot&gt;" dest="8+">
                <expression noclean="1">alt=&quot;Cover&quot; /&gt;&lt;/p&gt;\n&lt;p&gt;([^&lt;]*)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetDetails>
</scraper>

This scraper will never use the NfoURL function because I dont have any .nfo-files in the video directories.
Reply
#4
okay, scraper looks fine. make sure your addon.xml is as well, in particular the library=".." entry.
Reply
#5
Sorry it took so long to reply...birthdays^^

So here we go with the addon.xml (in the same folder as the scraper.xml)
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<addon id="metadata.dokujunkies.org"
       name="Dokujunkies"
       version="1.0.0"
       provider-name="m0nk3y">
  <requires>
    <import addon="xbmc.metadata" version="1.0"/>
  </requires>
  <extension point="xbmc.metadata.scraper.movies"
             language="de"
             library="dokujunkies.xml"/>
  <extension point="xbmc.addon.metadata">
    <summary lang="en">Dokujunkies.org Movie Scraper</summary>
    <description lang="en">Download Movie information from www.dokujunkies.org</description>
    <platform>all</platform>
  </extension>
</addon>
Just like the scraper this is a adapted copy of the addon.xml from metadata.themoviedb.org. I removed some of the requirements because my scraper isn't using "external" functions. I'm not shure what the requirement
Code:
<import addon="xbmc.metadata" version="1.0"/>
is used for so I thought it's better not to delete it. I don't know...
Reply
#6
Ok I think the addon.xml shouldn't be the problem. I createt a folder named metadata.test.org in the XBMC/addons directory. I pasted a copy of the metadata.themoviedb.org folders content in there. I replaced the addon.xml (the copy of the themoviedb.org-addon.xml) with my addon.xml pasted above. I renamed the tmdb.xml to test.xml (to avoid conflicts with the original tmdb scraper) and edited my addon.xml so it fits (changed scraper name to Test, changed lib path to test.xml).

So I have a copy of the original tmdb.xml connected to my addon.xml. And in XBMC it works just fine. Of course it wont find anything because of the documentation content but I don't get the "Unable to connect to remote Server" message. That's why I think there must be a problem with my scraper.xml but also after hours staring at it and making little changes I couldn't get it.
Reply
#7
From a brief look at it, the url part (&lt;url&gtWink is missing in <CreateSearchurl>.

EDIT: GetSearchResults is lacking the <results> node. Please double check all your output and compare it to a scraper that is known to work (such as themoviedb).
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#8
I added the <url> tag to <CreateSearchUrl> before...still doesn't work. I think <GetSearchResults> is ok
Code:
<RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
The <result> tag follows right after the <?xml.....
It is filled with <entity><title></title><url></url></entity> for each search result. I will check the output tomorrow again but I did this before.
Reply
#9
Please pastebin the entire xml file so I can have a look at it.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#10
Here you go. I checked everything again but still can't find the problem.
I tried to fill the empty tags in <GetDetails> (like <year></year>,....) with some "dummy" data but that didn't change anything.
Reply
#11
Looks good to me.. and it actually "kinda" works.

Currently you have ..

Code:
<RegExp input="$$1" output="&lt;year&gt;&lt;/year&gt;&lt;director&gt;&lt;/director&gt;&lt;runtime&gt;\1&lt;/runtime&gt;" dest="8+">
  <expression trim="1" noclean="1">&lt;strong&gt;Dauer:&lt;/strong&gt;([\s0-9:]*)</expression>
</RegExp>

.. which only populates the runtime. In that case just use ..

Code:
<RegExp input="$$1" output="&lt;runtime&gt;\1&lt;/runtime&gt;" dest="8+">
  <expression trim="1" noclean="1">&lt;strong&gt;Dauer:&lt;/strong&gt;([\s0-9:]*)</expression>
</RegExp>

.. and do the same for all video info tags (like year, director) you're able to fetch from the site.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#12
Oh ok. I thought I red somthing about "...some tags must not be missing..." while collecting information on how to write my own media scraper. But I wasn't able to recover that page with its list of the needed tags.
So I decided to copy all the tags used by the xbmc example scraper (from the wiki) whether I use them or not. I will remove the tags I cannot fetch information for.

Quote:Looks good to me.. and it actually "kinda" works.

Could you further explain that please? It "kinda" works with XBMC or with any scraper tester or something?
Or do you mean something like "it probably should work after solving the 'unable-to-connect' Problem"? Smile

\EDIT:
Removing all unused tags unfortunately did not solve the problem
Reply
#13
Kind works means, I've just replaced the scrapers .xml file from an installed add-on with yours and it "kinda" worked. Your build is actually using the add-on system?
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#14
I think so. It's a clean XBMC Eden Beta3 installation.

  1. Remove old XBMC (also the private data)
  2. Download and install XBMC Eden Beta 3
  3. Copy the metadata.dokujunkies.org folder to the addons directory
  4. Start XBMC
  5. Install YouTube plugin (combined with Common Plugin Cache)
  6. Add a new source that contains "movies"
  7. Select the Dokujunkies.org scraper to be used with this source
  8. Scan for new content

After that I get the "Unable to connect..."-Error. As I mentioned before I placed my scraper in the "C:\ProgrammFiles\XBMC\addons\" directory. I don't change anything in "C:\User\...\AppData\Roaming\XBMC\" what also contains a folder named addons.
Reply
#15
Please upload the entire addon and/or pastebin the entire xbmc debug log.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply

Logout Mark Read Team Forum Stats Members Help
Writing own movie scraper / Unable to connect to remote Server0