Local 'Scraping'

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
AnalogKid Offline
Fan
Posts: 648
Joined: Feb 2009
Reputation: 141
Post: #1
Not sure of the best way to express this, and it's as much a query as a potential feature... but:

Why not making the 'scraping' of media information from local NFO / TBN etc etc a scraping component in the same manner as traditional net scraping (IMDB, TMDB etc,)?

Would this not help to create a more flexible architecture for local info files?... in theory, allowing for different schemes to be used if folks wish, and someone's willing to write a scraper?


I appreciate the 'export' functionality might be an issue.
find quote
Martijn Offline
Team Kodi
Posts: 11,441
Joined: Jul 2011
Reputation: 165
Location: Dawn of time
Post: #2
If you have NFO files these will already be preferred over net scraping.

And why would you want to change the standard layouts of an xml file? Changing that would only add more confusion there already is

Always read the XBMC online-manual, FAQ and search the forums before posting.
Do NOT e-mail Team-XBMC members asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting, make sure you read this first

For your mediacenter artwork go to
[Image: fanarttv.png]
find quote
jjd-uk Offline
Team-Kodi Member
Posts: 3,175
Joined: Oct 2011
Reputation: 61
Post: #3
I would like to see a Local Scraper where you can specify in the settings:

Enable/Disable NFO support
Use Folder Name or Use File Name

So for example:

Enable NFO support
Use Folder Name

so it would use NFO if found then if not use Folder Name

or :

Disable NFO support
Use Folder Name

so it would ignore any NFO file and just use the Folder/Filename setting.

Being able to scrape stuff to the Library where there is no online info sources, for example Home Videos or Music Videos, would be great to have.
find quote
AnalogKid Offline
Fan
Posts: 648
Joined: Feb 2009
Reputation: 141
Post: #4
(2012-04-05 23:21)Martijn Wrote:  If you have NFO files these will already be preferred over net scraping.

And why would you want to change the standard layouts of an xml file? Changing that would only add more confusion there already is

That wasn't really the point, and it's not entirely accurate either.

Firstly - The point was that the architecture for scraping the local info isn't in keeping with that of net scrapers (when in theory, the two perform the same function, only from a different resource) - it's not about the priority of one over another.

Secondly - You've assumed info file = NFO - it's not something I said. I merely referred to a potentially different scheme (a more abstract term 'info files / metadata' - which might be ANY scheme (not just an XML file) - i.e. Where images / fanart / and more meta info might be found). - That's really an entirely different discussion about metadata schemes and nobody's saying the current (slightly inconsistent) scheme shouldn't remain as is.
The current scheme affords a fair amount of flexibility for thumbs and fanart, so it's hard to claim a 'strict standard' approach to things, and it's hard then to also say "we don't like flexibility' - because the current scheme tries to offer precisely that!

Also, the NFO doesn't always behave as you might expect, particularly when updating library / scanning for new content. In those instances new data in the NFO will NOT be read unless you choose a 'reload' on individual media (not ideal for masses of media), or you remove it from the library and reload.

How can it be a consistent approach that it's OK to have multiple net scrapers dealing with fetching data from different online databases, but it be 'wrong' to have the same approach if the data (behind the scenes) happens to be local?

Still, the fundamental question remains - how come local scraping isn't presented in the same manner as net scraping and simply a plugin?
(This post was last modified: 2012-04-06 18:49 by AnalogKid.)
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,230
Joined: Oct 2003
Reputation: 177
Post: #5
Because it's designed to operate in tandem. An nfo file can either be a full nfo, a mixed nfo or a URL nfo. The latter two MUST be followed by an online scrape, and indeed, the very point of them being there is so that the online scrape is successful. The former does not need to be. They can't be completely independent because of this.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
AnalogKid Offline
Fan
Posts: 648
Joined: Feb 2009
Reputation: 141
Post: #6
(2012-04-07 00:23)jmarshall Wrote:  Because it's designed to operate in tandem. An nfo file can either be a full nfo, a mixed nfo or a URL nfo. The latter two MUST be followed by an online scrape, and indeed, the very point of them being there is so that the online scrape is successful. The former does not need to be. They can't be completely independent because of this.

Hmmm

Scenario 1:
Media found (no NFO exists) so XBMC 'deduces' what it can from the filename / folder name, passes to an online scraper which then attempts to retrieve additional meta info from an online database

Scenario 2:
Media found with full NFO, so XBMC retrieves as much info from the NFO as possible No online lookup required.

Scenario 3:
Media found with mixed or URL NFO, so XBMC retrieves what it can from the NFO and passes to an online scraper which attempts to retrieve additional meta info from an online database.

So why can't XBMC just hand over to a chain of abstract scrapers (XBMC shouldn't care how or where the information is retrieved)?
i.e.
Scraper 1 ---> searches for local NFO, fanarts, thumbs

Scraper 2 ---> supplements the info collated by Scraper 1 by searching (say) TMDB. XBMC must already be capable of providing online scrapers with enough info to perform a rudimentary search since it's entirely possible that Scraper 1 found nothing and deduced nothing other than 'the movie APPEARS to be called <folder name>'

Scraper 3 and so forth?

I don't understand why there is any distinction between an online scraper and a 'local' scraper. Clearly, I'm not in the heart of the code like you guys... I must be missing something Undecided

You must be 'doing' what Scraper 1 does (albeit not under the guise of a scraper, but 'core XBMC logic') right?
(This post was last modified: 2012-04-07 01:13 by AnalogKid.)
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,230
Joined: Oct 2003
Reputation: 177
Post: #7
As long as scraper 1 can alter the "this is what to search for/on" list, sure, that's essentially what we're moving towards. The trick is to do it both without breaking things, as well as making it meaningful to the user - in particular, if we have a stack of scrapers to run through for a particular item, allowing that to be configurable by the user without it being overwhelming is difficult.

Further, as scrapers are essentially a combination of an XML parser and regexp evaluator, they don't quite have enough logic to necessarily do everything we'd like them to be able to do. Moving them into python is the next step along the chain.

There's some GSoC project proposals that are somewhat related to this - perhaps you could chime in on those?

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
AnalogKid Offline
Fan
Posts: 648
Joined: Feb 2009
Reputation: 141
Post: #8
(2012-04-07 00:23)jmarshall Wrote:  Because it's designed to operate in tandem. An nfo file can either be a full nfo, a mixed nfo or a URL nfo. The latter two MUST be followed by an online scrape, and indeed, the very point of them being there is so that the online scrape is successful. The former does not need to be. They can't be completely independent because of this.

Second read of this:

We know that online scraping works without the existence of an NFO, and we know that an NFO can exist without the need for online scraping (although in some cases, it will only be fruitful if an online scraper can retrieve the content online). XBMC won't 'fail' if no online scraper exists, or fails to retrieve the data.

So neither has a hard dependency on the existence of the other, and only in some cases does the online scraper 'supplement' that of the local scraper. So logically, they can, and do function without each other.

Is this hypothesis wrong?
(2012-04-07 01:19)jmarshall Wrote:  As long as scraper 1 can alter the "this is what to search for/on" list, sure, that's essentially what we're moving towards. The trick is to do it both without breaking things, as well as making it meaningful to the user - in particular, if we have a stack of scrapers to run through for a particular item, allowing that to be configurable by the user without it being overwhelming is difficult.

Further, as scrapers are essentially a combination of an XML parser and regexp evaluator, they don't quite have enough logic to necessarily do everything we'd like them to be able to do. Moving them into python is the next step along the chain.

There's some GSoC project proposals that are somewhat related to this - perhaps you could chime in on those?

Sorry about that, I posted as you were posting too... time lag etc...

I do take your point in making the 'chaining' logical to the user (if you even wanted to go down that route).

Please forgive me, it's easy for me to say something in a highly abstract 'theoretically neat' architecture... But I hope it doesn't cause offence. I just hope someone can pick on an idea (if it's worth picking up) and saying "hey... maybe that's right.. why aren't local and net scrapers the same thing?)

You've moved a lot of 'core' stuff into a plug-in architecture, and I just wondered if this might be a future candidate :-)
(This post was last modified: 2012-04-07 01:28 by AnalogKid.)
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,230
Joined: Oct 2003
Reputation: 177
Post: #9
Heh - you have to try pretty hard to offend me, so no worries there! All good ideas are considered (no matter how much effort it means for us devs). The trick with local scraping is that it requires good access to the filesystem, which ofcourse a regexp/XML parser isn't really the right tool for the job. A python script might be (assuming vfs access was available) however, thus the move in that direction.

Please do comment on the GSoC proposals related to this (in the GSoC dev forum) - it helps us evaluate the student to see how they interact with the community Smile

Cheers,
Jonathan

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
newbie007 Offline
Junior Member
Posts: 16
Joined: May 2012
Reputation: 0
Post: #10
Hi,

i wanted to try writing a local scrapper, which should do a search based on the term given and populate it to title and original title fields.

please find below the script i've tried... but its not working and donno how to make it work... as i'm trying it out for the first time...

<scraper name="localdb" content="movies" thumb="imdb.gif" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<include>metadata.common.localdb.com/localdb.xml</include>

<CreateSearchUrl dest="3">
<RegExp input="$$1" output="\1" dest="3">
<expression noclean="1"></expression>
</CreateSearchUrl>

<GetSearchResults dest="8">
<RegExp input="$$1" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity><title>\1</title></entity></results>" dest="8">
<expression noclean="1"></expression>
<expression />
</RegExp>
</GetSearchResults>

<GetDetails dest="3">
<RegExp input="$$1" output="<details><title>\1</title><originaltitle>\1</originaltitle><plot>Some dummies doing dumb things</plot></details>" dest="3">
<expression></expression>
</RegExp>
</GetDetails>

</scraper>


i need some suggestion from you guys to make this work...

Thanks in advance...
find quote
Koying Offline
Team-Kodi Member
Posts: 2,114
Joined: Sep 2008
Reputation: 44
Location: Brussels, Belgium
Post: #11
I found this thread while looking for comments on the same idea as the original poster.

I'd also want to have/code a way for XBMC to only use local nfo files, without a backing online provider. As XBMC supports nfo, it feels natural to me to be able to only use those.
That would assume that only "full" nfo (e.g. the ones created by an export) are used, the mixed or url ones would not be supported.

I see at least 2 advantages to this:
1) You have external control of what info are included in the library (whether by hand-editing the nfo or using 3rd party tools)
2) You have external control of what is actually included in the library (no nfo -> no import in library)

I'm ready to code this.
My approach would be to add a "Use Local info only" option in the category selection dialog. Enabling it would disable the scrapper selection list (and actual online scrapping in the backend, of course)

Would this patch be accepted?
find quote
mkortstiege Offline
Team-XBMC Developer
Posts: 2,907
Joined: Jan 2008
Reputation: 8
Location: Germany
Post: #12
Not sure if i like the idea about yet another GUI option but i think there's no way around it if we want to disable fetching meta from the net and only rely on local exports. So yea, i think it would be accepted.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
Koying Offline
Team-Kodi Member
Posts: 2,114
Joined: Sep 2008
Reputation: 44
Location: Brussels, Belgium
Post: #13
Another option would be to add a static, not addon backed, "Local info only" in the scrapper lists.

That would be probably nicer and more logical from an UI point-of-view but dirtier in the code...
find quote
mkortstiege Offline
Team-XBMC Developer
Posts: 2,907
Joined: Jan 2008
Reputation: 8
Location: Germany
Post: #14
How would it be dirtier code-wise? From what i had in mind the user chooses a "local importer" scraper. As long as there's a full nfo, the scraper code should not try to fetch something from the web. Note, this is only theory Wink Will have to try it later today.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
Koying Offline
Team-Kodi Member
Posts: 2,114
Joined: Sep 2008
Reputation: 44
Location: Brussels, Belgium
Post: #15
You mean an actual scrapper addon doing nothing, right?
Problem I see is that it would still try to import files without nfo, leaving "blank" entries in the library, while I'd try to avoid that.

What I meant is adding an entry to the scrapper list which would be backed by code, not addon.
find quote
Post Reply