TMDB scraper fix for question mark and dash

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
axmhari Offline
Junior Member
Posts: 2
Joined: Nov 2011
Reputation: 0
Post: #1
Hi,

I have encountered some issues regarding the TMDB scraper and some characters in the title used for searching.

1. The TMDB api does not return a valid search result XML, when the URL-encoded '?' character (%3F) is used. As a result XBMC will say, it cannot connect to the remote server and doesn't even provide the possibility to add the item manually. Furthermore this will interrupt the automatic library update.

2. There is another problem with the '-' character. At least TMDB will respond with a valid result XML saying "nothing found", and you can add the item manually. Here a double '-' character or omitting it completely in the search string fixes the problem.

I have prepared a patch for the current version 1.4.5 of the scraper to remove the two (URL-encoded) characters from the title.

Code:
--- a/metadata.themoviedb.org/tmdb.xml    2011-11-21 20:32:50.366929036 +0100
+++ b/metadata.themoviedb.org/tmdb.xml    2011-11-21 21:08:40.000000000 +0100
@@ -1,10 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?>
<scraper framework="1.1" date="2011-04-25">
        <CreateSearchUrl dest="3">
-               <RegExp input="$$1" output="&lt;url&gt;http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1$$4&lt;/url&gt;" dest="3">
+               <RegExp input="$$5" output="&lt;url&gt;http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1$$4&lt;/url&gt;" dest="3">
                        <RegExp input="$$2" output="+\1" dest="4">
                                <expression clear="yes">(.+)</expression>
                        </RegExp>
+                       <RegExp input="$$1" output="\1\2" dest="5">
+                               <expression noclean="1" repeat="yes">%3f|%2d|(%..)|([a-zA-Z0-9]*)</expression>
+                       </RegExp>
                        <expression noclean="1"/>
                </RegExp>
        </CreateSearchUrl>

I am relatively new to Regex expressions, so maybe there is a better solution. Anyway, it works well for me so I wanted to share it.

Kind regards!
(This post was last modified: 2011-11-21 23:52 by axmhari.)
find quote
bill0199 Offline
Junior Member
Posts: 7
Joined: Jun 2011
Reputation: 0
Post: #2
Or... remove those characters from the file name?
find quote
axmhari Offline
Junior Member
Posts: 2
Joined: Nov 2011
Reputation: 0
Post: #3
Of course, but IMHO it's a questionable workaround.

The problem occured for me when using Opdenkamp's PVR extension. There the title is delivered by the PVR server. At first I fixed it in the virtual filesystem created in the PVR code by removing the question mark character there. But I realized, that it's not a clean solution at all, because it's not a filesystem issue but a scraper issue, which has nothing to do with PVR.

Even apart from PVR, changing the filenames of a big bunch of video files might be at least annoying.

Futhermore other scrapers or a future TMDB api might support (or less probably need) those character...
find quote
raymod2 Offline
Junior Member
Posts: 39
Joined: Dec 2012
Reputation: 0
Post: #4
I see this thread is over a year old and this improvement hasn't been integrated into the XBMC builds yet. Is there a technical reason for this or did this thread just get lost in the noise? I did a scrape of my media library and noticed that the #1 reason for an unidentified media file was the presence of a dash (-) character in the title. For example, "X-Men".
find quote