IMDB not accepting certain useragent strings?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
yee379 Offline
Junior Member
Posts: 47
Joined: Feb 2004
Reputation: 0
Post: #1
just noticed today that i couldn't scrap any movies from my ubuntu system: looking at the logs i get:

Code:
16:51:57 T:140189498325328 M:835964928   DEBUG: FileCurl::Open(0x7fffd6390d08) http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)
16:51:57 T:140189498325328 M:835964928    INFO: easy_aquire - Created session to http://akas.imdb.com
16:51:57 T:140189498325328 M:835727360   DEBUG: FillBuffer: curl failed with code 22

attempting a wget:

Code:
$ wget 'http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)' --2010-01-02 16:52:32--  http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)
Resolving akas.imdb.com... 72.21.206.70
Connecting to akas.imdb.com|72.21.206.70|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2010-01-02 16:52:33 ERROR 403: Forbidden.

but doing a wget with -U:

Code:
$ wget -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14' 'http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)'
--2010-01-02 16:54:33--  http://akas.imdb.com/find?s=tt;q=the%20warlords%20(2007)
Resolving akas.imdb.com... 207.171.166.140
Connecting to akas.imdb.com|207.171.166.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `find?s=tt;q=the warlords (2007)'

    [   <=>                                                        ] 42,356      69.8K/s   in 0.6s    

2010-01-02 16:54:34 (69.8 KB/s) - `find?s=tt;q=the warlords (2007)' saved [42356]

also wget works fine on my mac...

is there an option somewhere where we can overload the useragent strings that xbmc/imdb scrapper uses?
find quote
jgora Offline
Junior Member
Posts: 21
Joined: Jan 2010
Reputation: 0
Post: #2
I am really new to xbmc and all - but I think I may have a similar problem to yourself. I'm finding that when I add a source for movies to xbmc and use imdb as a scrapper it doesn't seem to add anything to the library after scanning for about 60 seconds. However when I change the scrapper to tvdb.com it seems to be able to dload all the information.

Is there a limit as to how much you can use/download info from IMDB?!

Is there a fix for this?

Thanks
find quote
delirial Offline
Junior Member
Posts: 29
Joined: Oct 2008
Reputation: 0
Post: #3
That is correct. Getting the following for every movie that it tries to scrap since around 9:00PM yesterday:

00:15:06 T:2619337584 M:2954203136 ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.

Changing the scrapper to themoviedb.org seems to work (though I like IMDB better).

regards,
del
find quote
nick8539 Offline
Junior Member
Posts: 24
Joined: Sep 2009
Reputation: 0
Post: #4
I'm Having the same problem with IMDB on AppleTV. Is imdb scraping down? I can scrape with tmdb but its not as big ad imdb.
find quote
plankton88 Offline
Senior Member
Posts: 106
Joined: Jul 2008
Reputation: 0
Post: #5
Having trouble too. Tried to search for help in others area, but a nothing right now. Maybe I should update..? Using A build from November I think.
find quote
Nuka1195 Offline
Skilled Python Coder
Posts: 3,910
Joined: Dec 2004
Reputation: 18
Post: #6
try adding |User-Agent={your valid user agent} to end of the urls in imdb.xml

urlencoded of course

For python coding questions first see http://mirrors.xbmc.org/docs/python-docs/
find quote
rebaker501 Offline
Junior Member
Posts: 18
Joined: Sep 2008
Reputation: 0
Post: #7
stupid question.....what is a valid user agent?
find quote
delirial Offline
Junior Member
Posts: 29
Joined: Oct 2008
Reputation: 0
Post: #8
rebaker501,

'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14' is an example of a valid one. Basically, the user-agent is how your browser identifies itself when requesting a website.

IE: When you go to download firefox, you get the link for your platform (Mac, Win, Linux) automatically. That's because the website uses the user-agent to determine what is the right download.

If anyone gets this to work, please let us know.

del
find quote
aquariumdrinker Offline
Junior Member
Posts: 1
Joined: Jan 2010
Reputation: 0
Post: #9
No luck after adding my Firefox user agent string to the end of the url. My situation is somewhat similar to that described above (some time this afternoon, IMDB stopped yielding any results).
Code:
03:31:59 T:4316 M:4294967295   DEBUG: CVideoDatabase::GetMovieId (F:\Movies\12 Angry Men.m4v), query = select idMovie from movie where idFile=1
03:31:59 T:4316 M:4294967295   DEBUG: No NFO file found. Using title search for 'F:\Movies\12 Angry Men.m4v'
03:31:59 T:4316 M:4294967295   DEBUG: CIMDB::InternalFindMovie: Searching for '12 angry men' using IMDb.com scraper (file: 'imdb.xml', content: 'movies', language: 'en', date: '2009-08-10', framework: '1.1')
03:31:59 T:4316 M:4294967295   DEBUG: FileCurl::Open(06D9E628) http://akas.imdb.com/find?s=tt;q=12%20angry%20men%7cUser-Agent%3d%7bMozilla%2f5.0+(Windows%3b+U%3b+Windows+NT+6.0%3b+en-US%3b+rv%3a1.9.1.6)+Gecko%2f20091201+Firefox%2f3.5.6+GTB6+(.NET+CLR+3.5.30729)%7​d
03:31:59 T:4316 M:4294967295   DEBUG: XFILE::CFileCurl::CReadState::FillBuffer: curl failed with code 22
03:31:59 T:4316 M:4294967295   ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.
03:31:59 T:4316 M:4294967295   DEBUG: FileCurl::Close(06D9E628) http://akas.imdb.com/find?s=tt;q=12%20angry%20men%7cUser-Agent%3d%7bMozilla%2f5.0+(Windows%3b+U%3b+Windows+NT+6.0%3b+en-US%3b+rv%3a1.9.1.6)+Gecko%2f20091201+Firefox%2f3.5.6+GTB6+(.NET+CLR+3.5.30729)%7​d
(This post was last modified: 2010-01-03 10:35 by aquariumdrinker.)
find quote
yee379 Offline
Junior Member
Posts: 47
Joined: Feb 2004
Reputation: 0
Post: #10
Nuka1195 Wrote:try adding |User-Agent={your valid user agent} to end of the urls in imdb.xml

urlencoded of course

could you clarify what you mean please? i tried putting in:

Code:
<RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4&lt;/url&gt;%7CUser-Agent=Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20rv%3A1.8.1.14)%20Gecko%2F20080418%20Ubuntu%2F7.10%20(gutsy)%20Firefox%2F​2.0.0.14" dest="3">

but it still doesn't work Sad better still, could someone update the imdb.xml file on svn?

cheers,
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #11
Code:
<RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4|User-Agent=Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20rv%3A1.8.1.14)%20Gecko%2F20080418%20Ubuntu%2F7.10%20(gutsy)%20Firefox%2F​2.0.0.14&lt;/url&gt;" dest="3">
find quote
delirial Offline
Junior Member
Posts: 29
Joined: Oct 2008
Reputation: 0
Post: #12
thanks spiff!

Seems to be working now.
find quote
CloudDweller Offline
Senior Member
Posts: 133
Joined: Mar 2009
Reputation: 0
Post: #13
I'm totally new to all this editing XML stuff so can someone please either upload their edited XML or give me noobs guide as to what exactly I need to do as I can't get IMDB scraping to work?

Thanks
find quote
rufus210 Offline
Junior Member
Posts: 13
Joined: Jan 2010
Reputation: 0
Post: #14
Spiff: Thanks, this works.

ChrisWad: find system/scrapers/video/imdb.xml where you installed XBMC to. Open up the file with a text editor. Line 46 should contain "http://akas.imdb.com/find" (it's the only instance of "find" in the file). Replace the original line with the one Spiff posted.
find quote
jgora Offline
Junior Member
Posts: 21
Joined: Jan 2010
Reputation: 0
Post: #15
does that mean you will need to have mozilla firefox installed? - apologies if that is an obvious question
find quote
Post Reply