Clean scraping API

  Thread Rating:
  • 2 Votes - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
garbear Offline
Team-XBMC Developer
Posts: 425
Joined: Dec 2010
Reputation: 16
Location: gangsta's paradise
Post: #81
Wink kickass links

Always read the XBMC online-manual, FAQ and search the forums before posting.
Do NOT e-mail Team-XBMC members asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting, make sure you read this first
find quote
topfs2 Offline
Team-XBMC Developer
Posts: 3,825
Joined: Dec 2007
Reputation: 8
Post: #82
Since this thread has gotten so much heat as of lately I want to start a discussion on something I simply need some discussion on Smile

The discussion is regarding issue #7 #9 and semi related is #8.

The problem is not really the scheduling algorithms (they would need some love but in essence they should work) but more how to reorganize the API of supplies and demands.

Basically what we arrive at IMO is a subgraph find and alteration problem, which we in essence had before but with a single node (subject) and its edge.

So what I envision is something along the lines of
demands: find A where edge(A, owl.sameAs, B) and (B is URL or edge(B, dc.identifier))

As this would allow for this type of owl.sameAs
Code:
{
  owl.sameAs: [
    "http://themoviedb.org/movie/544",
    {
       dc.identifier: [ "http://www.imdb.com/title/tt0372784" ],
       foaf.thumbnail: [ "http://www.imdb.com/media/rm955554048/tt0372784?ref_=tt_ov_i" ]
    }
  ]
}

But I can't find a nice way to produce the above query in python, and in a pythonic way.

I'd love it if the demand and supply API was similair aswell, and provided some validation on the output aswell.

ATM a task can state it outputs a certain edge and nothing else but when run it can output anything Smile This could potentially break scheduling. So I'd love it if the task missbehave heimdall is able to detect that and just throw away the result Smile

Cheers,
Tobias

If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
find quote
garbear Offline
Team-XBMC Developer
Posts: 425
Joined: Dec 2010
Reputation: 16
Location: gangsta's paradise
Post: #83
(2013-05-09 19:49)The Movie Database Wrote:Searching is an important tool for a project like TMDb. Without a good search we end up with duplicates, frustrated users and quite frankly a less than stellar experience. Over the past few years we've had a lot of things change, especially with the amount of non-English content that has been added to our database. We've also grown a lot and our old search infrastructure simply wasn't up for the task.

Starting yesterday, we rolled out a completely brand new, built from scratch search that we feel very proud of. We're not saying it's going to be perfect but it's a foundation we can feel confident growing into.

Along with these improvements behind the scenes, we also added two new options to search with. 'primary_release_year' and 'search_type' are new. You can read about how these work by visiting our search documentation.

http://docs.themoviedb.apiary.io/#search

As always, if you notice any specific issues make sure to head over to our support area and let us know.

One last thing, we also released more than just a new search, as we have brought the idea behind our 2.1 "Movie.browse" method into v3 but made it considerably better. We've renamed it "discover" and it's pretty awesome. You can read more about it by visiting our API documentation.

http://docs.themoviedb.apiary.io/#discover
From their facebook page: https://www.facebook.com/themoviedb

It looks like they've been working heavily on the search issue as well. With a search engine on their end so heavily optimized in the domain of movies, I'm imagining how much thinking we're going to need to put in to actually contribute anything statistically significant to their results.

Always read the XBMC online-manual, FAQ and search the forums before posting.
Do NOT e-mail Team-XBMC members asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting, make sure you read this first
find quote
Post Reply