IMDB not accepting certain useragent strings?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
CloudDweller Offline
Senior Member
Posts: 133
Joined: Mar 2009
Reputation: 0
Post: #16
Thanks rufus210. I Did exactly what you said and now IMDB scraping is working again. Thanks a million!!!
find quote
lord_plankton Offline
Junior Member
Posts: 17
Joined: Jul 2009
Reputation: 0
Post: #17
jgora Wrote:does that mean you will need to have mozilla firefox installed? - apologies if that is an obvious question

no you dont need firefox installed
find quote
^FrEaK^ Offline
Junior Member
Posts: 35
Joined: Jul 2009
Reputation: 0
Post: #18
i've made a small fix for those that don't want to mess around with xml files themselves

http://rapidshare.com/files/329736910/IM...er_Fix.rar

just unpack and copy to you xbmc main folder

please let me know if the link doesn't work (said that the link only wold work 10 times somewhere on the page, so if someone have a better place to store it please do
find quote
redstorm Offline
Senior Member
Posts: 193
Joined: Sep 2009
Reputation: 2
Post: #19
Edited the xml and works for me now.

lol IMDB blocked xbmc for all of 2 seconds.
find quote
RckStr Offline
Senior Member
Posts: 172
Joined: Dec 2009
Reputation: 0
Post: #20
Anyone know where this file is located on the mac xbmc? I cant seen to find itConfused
find quote
zoxzox Offline
Junior Member
Posts: 11
Joined: Mar 2009
Reputation: 0
Post: #21
spiff Wrote:
Code:
<RegExp input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?s=tt;q=\1$$4|User-Agent=Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20rv%3A1.8.1.14)%20Gecko%2F20080418%20Ubuntu%2F7.10%20(gutsy)%20Firefox%2F​2.0.0.14&lt;/url&gt;" dest="3">

rufus210 Wrote:Spiff: Thanks, this works.

ChrisWad: find system/scrapers/video/imdb.xml where you installed XBMC to. Open up the file with a text editor. Line 46 should contain "http://akas.imdb.com/find" (it's the only instance of "find" in the file). Replace the original line with the one Spiff posted.

This works only if one is NOT using mixed mode movie.nfo files (the ones with imdb link for movie)...

Otherwise, miserably fails...
find quote
greed Offline
Donor
Posts: 12
Joined: Dec 2009
Reputation: 0
Post: #22
RckStr Wrote:Anyone know where this file is located on the mac xbmc? I cant seen to find itConfused

It's at XBMC.app/Contents/Resources/XBMC/system/scrapers/video.

Usually that'll be under /Applications.

However, I notice that the error page being returned (using tcpdump -A -s1500 port http) says the User-Agent may be blocked for misuse, such as automated access happening too quickly. Is there a way to put a rate-limiter in?

Or, best would be to just write an interface that uses the free-to-download database that IMDb provides.
find quote
delirial Offline
Junior Member
Posts: 29
Joined: Oct 2008
Reputation: 0
Post: #23
greed,

No need to set a rate-limiter. Also, it might be useless. The problem here is the amount of people using XBMC, not the amount of queries a single user does.

Also, they wont ban Firefox's user-agent. It's a browser, and millions use it in a legit manner. They want people to visit their site, which is what Firefox is for.

Using the database is not a great idea, it would be constantly outdated.

Finally, you can set your own user agent. For example, I tried using 'crap' and it worked. That potentially leaves every user with a different one and reduces the probability of getting banned.

regards,
del
find quote
Gooner14 Offline
Junior Member
Posts: 6
Joined: Dec 2009
Reputation: 0
Post: #24
I'm not sure if it's pulling down all the plot details and cast etc? Most of my descriptions end with '.Full summary' not sure they used to before? Can anyone confirm?
find quote
delirial Offline
Junior Member
Posts: 29
Joined: Oct 2008
Reputation: 0
Post: #25
Gooner14 Wrote:I'm not sure if it's pulling down all the plot details and cast etc? Most of my descriptions end with '.Full summary' not sure they used to before? Can anyone confirm?

Mine are also ending like that. It wasn't so before.
find quote
greed Offline
Donor
Posts: 12
Joined: Dec 2009
Reputation: 0
Post: #26
delirial Wrote:Mine are also ending like that. It wasn't so before.

You now need to visit a URL ending in "/plotsummary" to get an un-truncated summary. I don't know enough about the scraper structure to suggest an easy fix, or if there is an easy fix.

So, for example, http://www.imdb.com/title/tt0226379/plotsummary contains the full plot summary and http://www.imdb.com/title/tt0226379/ has the other details and the truncated plot.

I can't find a link or preference setting on IMDb to always display the full summary instead of the "more" or "full summary" link. It looks like a second URL fetch will be needed.

That's not going to stop me trying to figure it out, though someone who knows more might have a faster solution.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #27
zoxzox, if you can't sherlock your way on how to do it from nfo files, i won't confuse you with more information.
find quote
greed Offline
Donor
Posts: 12
Joined: Dec 2009
Reputation: 0
Post: #28
greed Wrote:That's not going to stop me trying to figure it out.

OK, I think I've got it. No warranty expressed or implied blah blah blah....

What I think is happening is, with the |User-Agent= added to the URLs, sections later in imdb.xml are now appending to the User-Agent= part, not the actual URL itself. So the rule that fetches the plotsummary URL needs to be able to recognize the | in a URL and break it accordingly.

Code:
<RegExp input="$$3" output="&lt;url function=&quot;GetIMDBPlot&quot;&gt;\1plotsummary\2&lt;/url&gt;" dest="5+">
<expression>^([^|]*)(|.*)?$</expression>
</RegExp>

I also had to change the input= on that rule to $$3, which makes me think every rule which references $$2 is also broken: I can't find a rule which uses dest="2". Oh, I also removed the block immediately prior to that one, which sets "plot" and "outline" tags from the main results page. We know that will be useless now.

All rules which compose a URL from $$3 probably need their expression tag changed as above.

Anyway. These changes work for me.
find quote
greed Offline
Donor
Posts: 12
Joined: Dec 2009
Reputation: 0
Post: #29
OK, I needed a similar treatment on blocks containing '$$3fullcredits' and '$$3posters'. I also added cache= on a few rules which should be able to use the same cached data as the !fullcredits rules.

Also, deleting the lines I described earlier was unnecessary; a botched edit elsewhere had made that seem necessary.

I've got a patch with these changes, minus the |User-Agent= additions I made, available: http://pastebin.com/m5135d8d6. It should be good until Feb 4, 2009.
find quote
zoxzox Offline
Junior Member
Posts: 11
Joined: Mar 2009
Reputation: 0
Post: #30
spiff Wrote:zoxzox, if you can't sherlock your way on how to do it from nfo files, i won't confuse you with more information.

Thanks, it works if appended in movie.nfo, but it's not my cup of tea...

I was hoping, user-agent can be introduced as as.xml for scraping.
find quote