Is it possible for the themoviedb scraper to ignore a prefix?

Is it possible for the themoviedb scraper to ignore a prefix? - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: Add-on Support (https://forum.kodi.tv/forumdisplay.php?fid=27)
+---- Forum: Information Providers (scrapers) (https://forum.kodi.tv/forumdisplay.php?fid=147)
+----- Forum: Movie Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=302)
+----- Thread: Is it possible for the themoviedb scraper to ignore a prefix? (/showthread.php?tid=187569)

Pages: 1 2

Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-26

As not all movie series come in the correct order when sorted alphabetically (extreme example is the 26 movie collection of Zatoichi), I would like to have prefix in the filename in Windows Explorer.
However, I'm having trouble finding a way to prefix so that themoviedb scraper ignores the prefix and finds my movies.
For some series, my prefix works fine (e.g.: James Bond), but for others it does not find most movies anymore (Zatoichi, Mad Max and many more).

I have tried many different prefixes, but none work well:

1-moviename
2-moviename
3-moviename
...

or

[1] moviename
[2] moviename
[3] moviename
...

or

1979 moviename
1981 moviename
1985 moviename

So my questions:
* does anyone know a prefix that might work?
* does anyone know a hack / patch so that themoviedb scraper can ignore the prefix? (e.g.: like some regex in advancedsettings.xml)

Thanks!

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Prof Yaffle - 2014-02-26

I've done variations of (1) and (2) with no problems... I've simply numbered the films, made sure the date is there, and off it went, e.g. 1. moviename [year].

You can also specify the imdb reference on a manual seatch, which solves a multitude of lookup problems.

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-26

1. Mad.Max.1979.1080p.DTS.HDMA --> not found
2. Mad.Max.2.1981.1080p.AC3.5.1.HQ --> not found
3. Mad.Max.Beyond.Thunderdome.1985.1080p.BluRay.x264-CiNEFiLE --> found

while

Mad.Max.1979.1080p.DTS.HDMA --> found
Mad.Max.2.1981.1080p.AC3.5.1.HQ --> found
Mad.Max.Beyond.Thunderdome.1985.1080p.BluRay.x264-CiNEFiLE --> found

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Prof Yaffle - 2014-02-26

What about "1. Mad Max [1979] - DTS HDMA" or variations? I wonder if the dots are confusing things as delimiters. Or "1 - Mad.Max.....". Or "1 - Mad.Max [1979] ....".

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-26

Thanks for the suggestion!

But that kinda would mess up my entire naming convention Confused

I prefer keeping the movie names as they are... only the prefix is changeable...

I don't really feel like renaming 1500 movies today Wink

The dots work fine in all other situations (without prefix) though

RE: Is it possible for the themoviedb scraper to ignore a prefix? - scudlee - 2014-02-26

This can't be done without editing the scraper.

Have a look at this thread for the basic idea.

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-28

Thanks for the tip!!

After many hours of messing around, I'm finally getting somewhere, but I'm still having issues getting it right...

I have modified my <CreateSearchUrl>, but I'm having trouble getting the regex right.
Here is a working one:

Code:
    <CreateSearchUrl dest="3">

    <RegExp input="$$1" output="\1" dest="1">

      <expression noclean="1">\[[0-9]\]_(.*)</expression>

    </RegExp>

        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">

            <RegExp input="$$2" output="\1" dest="4">

                <expression clear="yes">(.+)</expression>

            </RegExp>

            <expression noclean="1" />

        </RegExp>

    </CreateSearchUrl>

It works for a folder like "[1]_Mad.Max.1979.1080p.DTS.HDMA"

However, I would like the following to be possible:
"[1] Mad.Max.1979.1080p.DTS.HDMA"
and
"[10] Mad.Max.1979.1080p.DTS.HDMA"

following regex do not work for allowing the space:
<expression noclean="1">\[[0-9]\] (.*)</expression>
<expression noclean="1">\[[0-9]\]\s(.*)</expression>
<expression noclean="1">\[[0-9]\]%20(.*)</expression>
<expression noclean="1">\[[0-9]\]+(.*)</expression>

following regex do not work for allowing 2 numbers:
<expression noclean="1">\[[0-9]+\]_(.*)</expression>
<expression noclean="1">\[[0-9]{1,2}\]_(.*)</expression>
<expression noclean="1">\[[0-9][0-9]*\]_(.*)</expression>

I also can't view what is being fed to buffer 1 ($$1), so it is very hard to debug...
The link to scrap on this page does not work anymore:
http://wiki.xbmc.org/index.php?title=HOW-TO:Write_media_scrapers

Can anyone help me out?

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-28

Seems like I'm not that far yet Sad

The incomplete regex that I thought was working, actually isn't working very well yet Sad

<expression noclean="1">\[[0-9]\]_(.*)</expression>
recognizes
[1]_Mad.Max.1979.1080p.DTS.HDMA
but does NOT recognize
[5]_Mad.Max.1979.1080p.DTS.HDMA

I don't understand it....

Does anyone know how to display or log the input and the output of the <createsearchurl>?

RE: Is it possible for the themoviedb scraper to ignore a prefix? - scudlee - 2014-02-28

If you have debug logging turned on then you should be able to see what is in $$1 buffer, as it gets passed directly as the query parameter of the URL (assuming the added clean-up regex doesn't match).

The third space regex is the one that makes sense (spaces get percent-encoded). All of the 2-number regexes look valid.

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-28

ah yes, debug does log these kinds of things... thanks!

eg:
with Regex <expression noclean="1">\[[0-9]\]_(.*)</expression>
and movie [5]_Mad.Max.1979.1080p.DTS.HDMA

Code:
10:08 T:8700   DEBUG: VideoInfoScanner: Scanning dir 'D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\' as not in the database

10:08 T:8700   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv'

10:08 T:8700   DEBUG: ADDON::CScraper::FindMovie: Searching for '[5] Mad Max' using The Movie Database scraper (path: 'C:\Users\Mastakilla\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')

10:08 T:8700   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=%5b5%5d%20mad%20max&amp;year=1979&amp;language=en</url>

10:08 T:8700   DEBUG: CurlFile::Open(09DFFAA8) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=%5b5%5d%20mad%20max&year=1979&language=en

10:08 T:8700   DEBUG: scraper: GetSearchResults returned <results></results>

10:08 T:8700   DEBUG: ADDON::CScraper::FindMovie: Searching for '[5]_Mad.Max' using The Movie Database scraper (path: 'C:\Users\Mastakilla\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')

10:08 T:8700   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=%5b5%5d_mad.max&amp;year=1979&amp;language=en</url>

10:08 T:8700   DEBUG: CurlFile::Open(09DFFAA8) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=%5b5%5d_mad.max&year=1979&language=en

10:08 T:8700   DEBUG: scraper: GetSearchResults returned <results></results>

10:08 T:8700 WARNING: No information found for item 'D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv', it won't be added to the library.

10:08 T:8700   DEBUG: VideoInfoScanner: No (new) information was found in dir D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\

With the same regex and movie
[1]_Mad.Max.1979.1080p.DTS.HDMA

Code:
18:05 T:5784   DEBUG: VideoInfoScanner: Scanning dir 'D:\Videos\test\Mad Max Series (NL Subbed)\[1]_Mad.Max.1979.1080p.DTS.HDMA\' as not in the database

18:05 T:5784   DEBUG: CVideoDatabase::GetMovieId (D:\Videos\test\Mad Max Series (NL Subbed)\[1]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv), query = select idMovie from movie where idFile=7584

18:05 T:5784   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'D:\Videos\test\Mad Max Series (NL Subbed)\[1]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv'

18:05 T:5784   DEBUG: ADDON::CScraper::FindMovie: Searching for '[1] Mad Max' using The Movie Database scraper (path: 'C:\Users\Mastakilla\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')

18:05 T:5784   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=%5b1%5d%20mad%20max&amp;year=1979&amp;language=en</url>

18:05 T:5784   DEBUG: CurlFile::Open(0B2B06F0) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=%5b1%5d%20mad%20max&year=1979&language=en

18:05 T:5784    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://api.tmdb.org

18:05 T:5784   DEBUG: scraper: GetSearchResults returned <results><entity><title>Mad Max</title><id>9659</id><year>1979</year>

from this it becomes clear that my regex doesn't do ANYTHING Sad

(you can see that the url still contains the prefix in both cases, even when it finds the movie)

anyone have an idea what I'm doing wrong?

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-02-28

I also just tried with the regexp within the main regexp, but still doesn't work Sad

Code:
    <CreateSearchUrl dest="3">

        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">

            <RegExp input="$$2" output="\1" dest="4">

                <expression clear="yes">(.+)</expression>

            </RegExp>

      <RegExp input="$$1" output="\1" dest="1">

        <expression noclean="1">\[[0-9]+\]%20(.*)</expression>

      </RegExp>

            <expression noclean="1" />

        </RegExp>

    </CreateSearchUrl>

RE: Is it possible for the themoviedb scraper to ignore a prefix? - scudlee - 2014-02-28

Looking at the output, it looks like the square brackets are also being percent-encoded, so you'd want a regex like:

Code:
<expression noclean="1">%5b[0-9]+%5d%20(.*)</expression>

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-03-01

good point! thanks!

but unfortunately still not working Sad

Code:
    <CreateSearchUrl dest="3">

    <RegExp input="$$1" output="\1" dest="1">

      <expression noclean="1">%5b[0-9]+%5d%20(.*)</expression>

    </RegExp>

        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">

            <RegExp input="$$2" output="\1" dest="4">

                <expression clear="yes">(.+)</expression>

            </RegExp>

            <expression noclean="1" />

        </RegExp>

    </CreateSearchUrl>

RE: Is it possible for the themoviedb scraper to ignore a prefix? - scudlee - 2014-03-01

Aww crap. I just tested it... I forgot about an inescapable bit of core code - underscores are always converted to spaces, but the periods are only converted to spaces if there are no actual spaces in the name, otherwise they are left as-is.

So, "[1]_Mad.Max.1979.1080p.DTS.HDMA" will get cleaned up to "[1] Mad Max" and then get percent-encoded to "%5b1%5d%20Mad%20Max" for the scraper.

Whereas "[1] Mad.Max.1979.1080p.DTS.HDMA" will get cleaned up to "[1] Mad.Max" and then get percent-encoded to "%5b1%5d%20Mad.Max".

Using the underscore, you can clean to "Mad%20Max" and get a match, but with the space you'd be left with "Mad.Max", which doesn't.

No easy way around that.

The code you posted worked for me using underscores.

Relevant lines from the debug log:

Code:
43:20 T:7140   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'E:\Videos\Test\[1]_Mad.Max.1979.1080p.DTS.HDMA\movie.disc'

...

43:20 T:8124   DEBUG: ADDON::CScraper::FindMovie: Searching for '[1] Mad Max' using The Movie Database scraper (path: 'C:\Users\ScudLee\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')

43:20 T:8124   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=Mad%20Max&amp;year=1979&amp;language=en</url>

43:20 T:8124   DEBUG: CurlFile::Open(03776660) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=Mad%20Max&year=1979&language=en

43:20 T:8124   DEBUG: CScraperUrl::Get: Using "UTF-8" charset for "http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=Mad%20Max&year=1979&language=en"

43:20 T:8124   DEBUG: scraper: GetSearchResults returned <results><entity><title>Mad Max</title><id>9659</id><year>1979</year><url cache="tmdb-en-9659.json">http://api.tmdb.org/3/movie/9659?api_key=57983e31fb435df4df77afb854740ea9&amp;language=en</url></entity><entity><title>Mad Max</title><id>9659</id><year>1979</year><url cache="tmdb-en-9659.json">http://api.tmdb.org/3/movie/9659?api_key=57983e31fb435df4df77afb854740ea9&amp;language=en</url></entity></results>

RE: Is it possible for the themoviedb scraper to ignore a prefix? - Mastakilla - 2014-03-03

Thanks for that extremely crucial bit of information.
That explains a lot...

I'm now using the following (and it works!) :

Code:
    <CreateSearchUrl dest="3">

    <RegExp input="$$1" output="\1" dest="1">

      <expression noclean="1">%5b[0-9]+%5d%20(.*)</expression>

    </RegExp>

        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">

            <RegExp input="$$2" output="\1" dest="4">

                <expression clear="yes">(.+)</expression>

            </RegExp>

            <expression noclean="1" />

        </RegExp>

    </CreateSearchUrl>

I'm using the following prefixes now
[1].Mad.Max.1979.1080p.DTS.HDMA
[2].Mad.Max.2.1981.1080p.AC3.5.1.HQ
etc

also works for multiple numbers like [11].

Thanks again for the support!