can we please please provide an alternate way to write scrapers

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #1
I've spent hours trying to tweak the anidb scraper, using the crazy xml and regex dsl that we have to jump through hoops to do some of the complex stuff that's required.

I constantly find myself faced with a problem and having to attempt a dozen different solutions before finding one that works simple because a dozen ways of solving problems are not viable.

I understand that the choice of developing scrapers the way we do was probably built this way to be easy or to sandbox it. But it doesn't serve the purpose of easiness anymore.

Firstly manipulating xml streams with regex is really hard, and it's sad that we dont have dom methods. Secondly there is so much logic in some of the scrapers we have that would be so much better expressed in a general purpose language.

Right now it's almost torture to have to tweak bugs in scrapers. I'm not suggesting dropping the current scraper technology, but how about the ability to write scrapers in a more general purpose scripting language.

If sandboxing is an issue, we could bundle a javascript or a lua runtime into the system. But I think this will provide great sanity to the scraper scripts. So many of the subtle bugs that scrapers have are actually artifacts of the technology choice. I have a lot of false negatives or positives I just cannot fix because I'm unable to make my scraper smarter.

thanks
(This post was last modified: 2011-03-12 22:15 by pathw.)
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,221
Joined: Oct 2003
Reputation: 178
Post: #2
We have python available already, so using that instead is a reasonable option. A patch would be welcome.

Cheers,
Jonathan

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote