Login at Kodi Home

CloudDweller · 2010-01-03, 14:27

Thanks rufus210. I Did exactly what you said and now IMDB scraping is working again. Thanks a million!!!

lord_plankton · 2010-01-03, 15:10

jgora Wrote:does that mean you will need to have mozilla firefox installed? - apologies if that is an obvious question

no you dont need firefox installed

^FrEaK^ · 2010-01-03, 16:46

i've made a small fix for those that don't want to mess around with xml files themselves

http://rapidshare.com/files/329736910/IM...er_Fix.rar

just unpack and copy to you xbmc main folder

please let me know if the link doesn't work (said that the link only wold work 10 times somewhere on the page, so if someone have a better place to store it please do

redstorm · 2010-01-03, 17:31

Edited the xml and works for me now.

lol IMDB blocked xbmc for all of 2 seconds.

RckStr · 2010-01-03, 17:35

Anyone know where this file is located on the mac xbmc? I cant seen to find it Huh

zoxzox · 2010-01-03, 19:36

spiff Wrote:
Code:
<RegExp input="$$1" output="<url>http://akas.imdb.com/find?s=tt;q=\1$$4|User-Agent=Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20rv%3A1.8.1.14)%20Gecko%2F20080418%20Ubuntu%2F7.10%20(gutsy)%20Firefox%2F2.0.0.14</url>" dest="3">

rufus210 Wrote:Spiff: Thanks, this works.

ChrisWad: find system/scrapers/video/imdb.xml where you installed XBMC to. Open up the file with a text editor. Line 46 should contain "http://akas.imdb.com/find" (it's the only instance of "find" in the file). Replace the original line with the one Spiff posted.

This works only if one is NOT using mixed mode movie.nfo files (the ones with imdb link for movie)...

Otherwise, miserably fails...

**greed** · 2010-01-03, 19:48

RckStr Wrote:Anyone know where this file is located on the mac xbmc? I cant seen to find it

It's at XBMC.app/Contents/Resources/XBMC/system/scrapers/video.

Usually that'll be under /Applications.

However, I notice that the error page being returned (using tcpdump -A -s1500 port http) says the User-Agent may be blocked for misuse, such as automated access happening too quickly. Is there a way to put a rate-limiter in?

Or, best would be to just write an interface that uses the free-to-download database that IMDb provides.

delirial · 2010-01-03, 19:55

greed,

No need to set a rate-limiter. Also, it might be useless. The problem here is the amount of people using XBMC, not the amount of queries a single user does.

Also, they wont ban Firefox's user-agent. It's a browser, and millions use it in a legit manner. They want people to visit their site, which is what Firefox is for.

Using the database is not a great idea, it would be constantly outdated.

Finally, you can set your own user agent. For example, I tried using 'crap' and it worked. That potentially leaves every user with a different one and reduces the probability of getting banned.

regards,
del

Gooner14 · 2010-01-03, 21:15

I'm not sure if it's pulling down all the plot details and cast etc? Most of my descriptions end with '.Full summary' not sure they used to before? Can anyone confirm?

delirial · 2010-01-03, 21:43

Gooner14 Wrote:I'm not sure if it's pulling down all the plot details and cast etc? Most of my descriptions end with '.Full summary' not sure they used to before? Can anyone confirm?

Mine are also ending like that. It wasn't so before.

**greed** · 2010-01-03, 22:54

delirial Wrote:Mine are also ending like that. It wasn't so before.

You now need to visit a URL ending in "/plotsummary" to get an un-truncated summary. I don't know enough about the scraper structure to suggest an easy fix, or if there is an easy fix.

So, for example, http://www.imdb.com/title/tt0226379/plotsummary contains the full plot summary and http://www.imdb.com/title/tt0226379/ has the other details and the truncated plot.

I can't find a link or preference setting on IMDb to always display the full summary instead of the "more" or "full summary" link. It looks like a second URL fetch will be needed.

That's not going to stop me trying to figure it out, though someone who knows more might have a faster solution.

**spiff** · 2010-01-03, 23:50

zoxzox, if you can't sherlock your way on how to do it from nfo files, i won't confuse you with more information.

**greed** · 2010-01-04, 01:02

greed Wrote:That's not going to stop me trying to figure it out.

OK, I think I've got it. No warranty expressed or implied blah blah blah....

What I think is happening is, with the |User-Agent= added to the URLs, sections later in imdb.xml are now appending to the User-Agent= part, not the actual URL itself. So the rule that fetches the plotsummary URL needs to be able to recognize the | in a URL and break it accordingly.

Code:
<RegExp input="$$3" output="&lt;url function=&quot;GetIMDBPlot&quot;&gt;\1plotsummary\2&lt;/url&gt;" dest="5+">

<expression>^([^|]*)(|.*)?$</expression>

</RegExp>

I also had to change the input= on that rule to $$3, which makes me think every rule which references $$2 is also broken: I can't find a rule which uses dest="2". Oh, I also removed the block immediately prior to that one, which sets "plot" and "outline" tags from the main results page. We know that will be useless now.

All rules which compose a URL from $$3 probably need their expression tag changed as above.

Anyway. These changes work for me.

**greed** · 2010-01-04, 04:57

OK, I needed a similar treatment on blocks containing '$$3fullcredits' and '$$3posters'. I also added cache= on a few rules which should be able to use the same cached data as the !fullcredits rules.

Also, deleting the lines I described earlier was unnecessary; a botched edit elsewhere had made that seem necessary.

I've got a patch with these changes, minus the |User-Agent= additions I made, available: http://pastebin.com/m5135d8d6. It should be good until Feb 4, 2009.

zoxzox · 2010-01-04, 08:15

spiff Wrote:zoxzox, if you can't sherlock your way on how to do it from nfo files, i won't confuse you with more information.

Thanks, it works if appended in movie.nfo, but it's not my cup of tea...

I was hoping, user-agent can be introduced as as.xml for scraping.