Some newbie questions about scraper development
#1
Hi,

I'm brand new to scraper development, but i managed to make a "plugin" to the universal media scanner now that gets a json-response with movie plot in swedish, from a IMDB-id. and it seems to work pretty good. but i have some questions...

1. Sometimes the movie plots contains line breaks "\r", i'm not really sure how to remove those from the plot text. (i will add my code below in code block)
2. Is there some kind of best practice how to do if i do changes in metadata.universal so it doesn't get removed if that scraper is updated?
3. Is there any way to set a movies watch-status to watched from the scraper?
4. Is there any way to inside my scraper select a different scraper as fallback? say if my scraper won't find a plot to use IMDbs insted of just leaving it blank?


Code:
<scraperfunctions>
    <GetFilmtipsetPlotByIMDbID clearbuffers="no" dest="4">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="4">
            <RegExp input="$$1" output="\1" dest="8">
                <expression clear="yes" noclean="1">tt([0-9]{6,8})</expression>
            </RegExp>
            <RegExp input="$$8" output="&lt;url function=&quot;ParseFilmtipsetPlot&quot; cache=&quot;filmtipset-\1.json&quot;&gt;http://nyheter24.se/filmtipset/api/api.cgi?accesskey=APIKEY&amp;action=imdb&amp;returntype=json&amp;nocomments=1&amp;id=\1&lt;/url&gt;" dest="5">
                <expression />
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetFilmtipsetPlotByIMDbID>
    
    <ParseFilmtipsetPlot dest="5">
        <RegExp input="$$2" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="\1" dest="9">
                <expression clear="yes" fixchars="0">&quot;description&quot;:&quot;(.*?)&quot;,&quot;</expression>
            </RegExp>
            <RegExp input="$$9" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="2">
                <expression>(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </ParseFilmtipsetPlot>
</scraperfunctions>
Reply
#2
1. Assuming you mean the literal characters "\r", something like:
Code:
<RegExp input="$$9" output="\1 " dest="9">
    <expression repeat="yes" noclean="1">((?:(?!\\r).)+)(?:\\r)*</expression>
</RegExp>
Might work (not tested). I've also probably over-thought the regexp. A simpler one might do the job just as well.

2. Simplest way is to to make a copy of the scraper folder, and in addon.xml change the id attribute (so XBMC sees it as a different addon) and change the name (so you can distinguish it in the list of scrapers). Name the copied folder after the new id and you're all set. Just change content on your source folders to use the modified scraper.

Of course, if there are updates to the original, you'll have to merge them in manually.

3. Not sure, offhand. Try getting the scraper to output <playcount>1</playcount> (or maybe <watched>1</watched>, not sure?) Might work...

4. Simplest way would be to just run the fallback first. As long as your function returns an empty details if no plot is found, the earlier found plot would get used.

A more efficient method would be to check for an empty string, ^$, and then output the necessary IMDb chain function instead of nothing. Something like that.

If you switch out the $$2 in your function for another buffer, you'll find that the url is actually in $$2, so you can easily recapture the IMDb id from that, although obviously it may also be available from the JSON response anyway.
Reply
#3
Thanks for your help, been able to solve some of the stuff.

1. Seems to work, eventhough i don't really understand the your regex..
2. Sounds like good way to do it.
3. I tried adding it to the tittle output, like: '<playcount>1</playcount><details><title>Citizen Kane</title></details>' or '<watched>1</watched><details><title>Citizen Kane</title></details>' but that didn't seem to work. Is that a wrong way to do it or is the functions not there?
4. Not really sure how to do that. in universal i added:
Code:
<RegExp input="$INFO[titlesource]" output="&lt;chain function=&quot;GetFilmtipsetTitleByIMDbID&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
    <expression>Filmtipset.se</expression>
</RegExp>
should that changes be to that or to the GetFilmtipsetTitleByIMDbID-function? i there some kind of if-statement thing to use? or do i do it with the conditonal parameter somehow?
Reply
#4
1. It's a little convoluted, I agree. The first part uses negative lookahead to basically capture as much as possible that doesn't contain a \r and the second part just matches against zero-or-more \r's so that the repeat starts after them (otherwise the next match would begin at the "r").

3. You'd need to add it within the details tag for it to have any chance. (Although it might not work anyway.)

4. In the simple method, you could just have the RegExp in the universal output both the IMDB chain function and the Filmtipset one in one go (or possibly do it in two separate RegExps).

In the better method, in your ParseFilmtipsetPlot function, you'd first add a RegExp to recapture the IMDB id and put into a buffer (let's say $$10), and then after you've attempted to output the plot into $$2, have:

Code:
<RegExp input="$$9" output="&lt;chain function=&quot;GetIMDBPlotById&quot;&gt;$$10&lt;/chain&gt;" dest="2">
    <expression>^$</expression>
</RegExp>

Since only one of (.+) and ^$ can be true, the function will either output the plot directly, or make the IMDB plot function run.
Reply
#5
worked great to run both functions in universal.xml Smile

setting as viewed didn't work though, oerhaps it's not possible to do from a scraper?
Reply
#6
I've been looking for a Filmtipset-scraper for some time and finally found this thread. I understand the part about copying an already present addon and then copying the scraper text above into it. But before I try, kebarvid do you have a package made? Maybe you already put into a repository? Would be great!
Reply

Logout Mark Read Team Forum Stats Members Help
Some newbie questions about scraper development0