Help to rewrite the whole SQL Database?
#16
(2012-04-09, 21:19)topfs2 Wrote:
(2012-04-09, 15:03)da-anda Wrote: sorry - I haven't read the 4 huge posts of you guys yet, I just wanted to mention that I'd really love to work on a new DB layout. Developing domain driven (DDD) for some years now, I have a pretty good sense for abstracting and generalizing data. I also know about the requirements of initially simple queries for basic media listing + on demand queries. The php framework I'm coding with has a nice concept for this - it's called "lazy loading". Basically you have a "repository" (a class that returns objects of a certain type, abstracting the sql stuff from the application layer) that fetches raw DB data from the primary object table and returns it as ready to use objects. Object properties that are relations to other tables can either be loaded eager (so instantly on creating the object by triggering subqueries) or lazy, which means that the object property itself only holds a proxy class that will load the according data/object on demand/first use. This is pretty nice to work with and I think that's also how it should work in XBMC on the long run, because it would easily allow the desired generalized DB schema.

Must say that having a way to handle linking to other databases is a great thing, that would allow us to for example link a fileitem to a movie on themoviedb and thus just have a form of cache which acts themoviedb, that way we could schedule updates from themoviedb as needed but also if a fileitem is wrongly classed as a movie we can just remove that link.

da-anda is talking about an Object-relational mapper (I'm guessing doctrine, which is a php implementation of the java orm), I'm pretty sure there must be good c++ implementations around. However, I don't really see how that helps with linking to other databases...
Reply
#17
Obviously, MatsF is a great enterprise application designer. Lot of good ideas, for example: more meaningful column names, DateCreated and DateChanged for every table.

But I have one concern about delete flag. If this design is for a separate XBMC database management project for large/huge amount of content, and historical data is required, then delete flag (Delete date maybe better for debug purpose) is useful. But most XBMC user use it for home media center, unless XBMC team can provide good database maintain tools/scripts, a database full with out-of-date data could be pretty mess. Many of user is looking for a tool to clean up Thumbnails already. I can imaging, soon, user will start to looking for database clean up tool.

Just my 2 cents.
Reply
#18
(2012-04-09, 11:21)jmarshall Wrote: That was quick Smile
Well this answer was a little bit slower. I had some other things needed to be done....

(2012-04-09, 11:21)jmarshall Wrote: 2. Do you mean dropping idCollection from the content table and instead just using the collectionlinkcontent link table? I'm not sure under which circumstances a piece of content can be in more than one collection (assuming collection here means something like "TV Show") - I guess technically some episodes can be in 2 "different" seasons (the season they air during, and "specials").
No, i didn't mean that. What I meant was that in the proposal it is 2 links from Content.idCollection. One directly to table Collection and one to the table CollectionLinkContent and the later is the actual N:N link between Content and Collection. So what I just am trying to say is that you don't need the first link, the one between Content and directly to Collection that is according to the text primary collection, If you look at the diagram you can see that there is only the (what I think) wrong one showed there, the 1:N between Content and Collection. What I think should be right is to move that to just be between Content and CollectionLinkContent and then it will be a N:N.

(2012-04-09, 11:21)jmarshall Wrote: 3. Agreed - the only reason for more than just Online/Offline is that in some cases we'll know that not only the item is offline, but it's actually most likely deleted (we'd mark this differently in the UI). Either way, it's the same basic idea.
It is nice when we agree Smile

(2012-04-09, 11:21)jmarshall Wrote: 4. The current db (i.e. in Eden) has playCount in the files table. Personally I think this needs to be a profile-level setting. Possibly also we may wish to consider storing play history (i.e. a list of dates when the file was played - atm we just have playcount and lastPlayed), though I suspect this is getting a little fine-grained for most users. We definitely need the profiles in the database and most things associated with the profile (user rating, playcount, settings possibly tied to that rather than just to the file/content).
5. What I meant is we don't store the history - we do store lastPlayed in the Eden database (in the files table).
My bad, both in thinking and writing.i will rewrite:
Yes,and in 4.0a you store lastPlayed in Settings. And yes, I do agree that it should be profile level wherever you decide to put it, I dont know where I said otherwise, but I have never ment anything other than that.

[playCount] You have to create a new table for playCount to work with count(). I still think it is a good idea to have it. (My own example: I have just started to look at West Wing again and I would have loved to be able to see if it was 2006 or 2007 I watched it the first time). But that is maybe because I am older so I don´t remember exactly when I saw it and I dont remember a lot of things in the show either ;-) ).
[Watched] Instead, the question of where to put Watched is just a question about what function you want to have in XBMC (my previous answer). Although if you want it to be that the Content is watched and not the file (which I still think makes the best function in XBMC), then you have to put it in a new table holding idProfile, idContent and Watched (T/F), I didnt think of that, but it was not my meaning to make the Watched field without Profile-dependence. And the same actually apply if you put it in File.

(2012-04-09, 11:21)jmarshall Wrote: 6. Hmm, still not sure I get this one. Documentaries would be a sub-collection of TV Shows, but would there be links between the collection "TV Shows" and the content? eg. if we have 2 tv shows "Planet Earth" and "Mad Men" then one is a documentary and one is just a TV Show (no subcategory). How do you see the link tables being setup between the "TV Shows" and "Documentaries" items, the shows, and the episodes in the show?

No, sometimes I am explaining really bad, sorry for that. And my suggestion is probably not even needed, it is just for help in making a query a little bit easier. What I meant was that the field TopLevel just was to be used for the content that has child-information (episodes or songs). It has nothing to do with subcollections (like Documentaries) more than it is easier to get all the Documentaries if the field TopLevel is there also.

The Show-episode links are the easy one, There you already have the table ContentLinkContent.
And then when you need the episodes you just query the Content and ContentLinkContent where ContentLinkContent.idParent = (the idContent of the show) AND Content.TopLevel = False (and probably some season-filter also).

It is just a little bit easier, nothing more than that.

(2012-04-09, 11:21)jmarshall Wrote: 17. Actually, this is referring to playable items pretty much (episodes, movies, music videos, clips etc) - often movies for instance are actually a folder on disk (or a virtual folder such as a rar or zip) or could be a folder of folders (such as a dvd or bluray folder layout). When the user is browsing their files, they're not really interested in the layout of the files for that movie beyond the fact that the folder item represents it. They don't want to have to click into the folder then click on the movie, they just want it to play when the click on it immediately. Thus, we need something similar to the files table representing these virtual items on the disk - it would have idPath and idContent, possibly idFile and other information (eg hash info so we can detect when things inside that folder changed).

To me you already have that information in the File so obviously there is my turn to not understand exactly what you mean.
Aha, maybe I understand it and this is for file browsing only. Then you either put that information also in the table File or you create a new table for Content and put the information there for those Contents that needs it.

(2012-04-09, 11:21)jmarshall Wrote: As for where to next, this is currently in limbo due to no-one pushing it forward, so if you'd like to be involved, then great - as I say, it seems to me that the first step is refining the schema to the point we think it has everything we need (we'll likely not know for sure until it's been implemented) and then get a test framework up and running.

Yes I do want to be involved, if you see that I can be of any use.

Can I push it forward, and if so how?

Obviously I can´t do the refining until you tell me everything you need (or if 4.0a is mostly everything you need, then it is fine).
Also, when we fill the test framework with data then some things will change, I guarantee Smile

The test framework is that going to be in SQLite or something else? If I will be involved then I probably will create a local mirror with Microsoft SQL Server because that is the one I know inside out....

So how will we get there?

/Mats
Reply
#19
(2012-04-10, 13:50)sebak Wrote:
(2012-04-09, 21:19)topfs2 Wrote:
(2012-04-09, 15:03)da-anda Wrote: sorry - I haven't read the 4 huge posts of you guys yet, I just wanted to mention that I'd really love to work on a new DB layout. Developing domain driven (DDD) for some years now, I have a pretty good sense for abstracting and generalizing data. I also know about the requirements of initially simple queries for basic media listing + on demand queries. The php framework I'm coding with has a nice concept for this - it's called "lazy loading". Basically you have a "repository" (a class that returns objects of a certain type, abstracting the sql stuff from the application layer) that fetches raw DB data from the primary object table and returns it as ready to use objects. Object properties that are relations to other tables can either be loaded eager (so instantly on creating the object by triggering subqueries) or lazy, which means that the object property itself only holds a proxy class that will load the according data/object on demand/first use. This is pretty nice to work with and I think that's also how it should work in XBMC on the long run, because it would easily allow the desired generalized DB schema.

Must say that having a way to handle linking to other databases is a great thing, that would allow us to for example link a fileitem to a movie on themoviedb and thus just have a form of cache which acts themoviedb, that way we could schedule updates from themoviedb as needed but also if a fileitem is wrongly classed as a movie we can just remove that link.

da-anda is talking about an Object-relational mapper (I'm guessing doctrine, which is a php implementation of the java orm), I'm pretty sure there must be good c++ implementations around. However, I don't really see how that helps with linking to other databases...

Not saying his does but the idea to link to other databases is a good thing to consider. i.e. follow the semantical web ideals. So a file on your harddrive is imdb/tt0096754, much as its done in rdf etc.
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply
#20
(2012-04-11, 15:38)topfs2 Wrote:
(2012-04-10, 13:50)sebak Wrote: da-anda is talking about an Object-relational mapper (I'm guessing doctrine, which is a php implementation of the java orm), I'm pretty sure there must be good c++ implementations around. However, I don't really see how that helps with linking to other databases...

Not saying his does but the idea to link to other databases is a good thing to consider. i.e. follow the semantical web ideals. So a file on your harddrive is imdb/tt0096754, much as its done in rdf etc.

yes, I was talking about a ORM, but not doctrine - it's a custom implementation for a CMS I'm using (Extbase-Framework for TYPO3).

@relation to other DBs
using proxy objects as properties would allow that easily I guess.
Reply
#21
It's great to see some new discussion on this front!

IMHO one thing that needs to be done, is to make the db schema less video-centric.
Specifically the content table has fields (plot,tagline,season,episode,etc) that do not apply to non-video content. Also it is not very clear what is defined as content record. Are the movie actor or music artist data stored there as well? I guess not, but then an artist table is missing.
Perhaps the first step would be to define in detail, all the entities (movie,tvepisode,music video clip,song,picture,artist,file,etc) and their collections (tvshow,season,album,disc,photo album) that need to get stored in the db, together with their attributes. After that the assignments to tables and columns can be done with precision.

Also, is this db going to replace all the currently separate dbs? It'd be great to have only one db. It also helps to see the big picture. Currently I have: commoncache.db, Addons15.db, Epg7.db, MyMusic18.db, MyPictures.db, MyPrograms6.db, MyVideos61.db, Textures6.db, TV20.db, ViewModes4.db
Reply
#22
(2012-04-10, 08:23)jmarshall Wrote: Another idea is just dropping SQL and using a document store (eg Kyoto Cabinet with a JSON document for value). Though we'd lose the experience of those that deal with SQL, we might gain with a c++ style interface (map:reduce implementation, cursors etc.)

i'd probably suggest that this suggestion would be a bad thing in terms of running a single central database feeding multiple client machines. also, it's trivial to put a json interface in front of the database itself for retrieving the content thereby allowing you to entirely abstract away the database itself from the code - all communications would be through the json layer, which means that you could switch multiple database back-end code in and out based on usage scenarios or user preference, and in theory enable further support of more back-end data stores as users decide to code them up.

that, however, would require a database api to be created, which i'd suggest might be a better place to start - the api would likely inform the database structure to a large extent. abstracting out the database api would also have the added benefit of allowing other mechanisms (e.g. a server-side database updater, a library editor etc) to be built independently of xmbc itself. you could expose the database access layer via an api or a library and have it running as a back-end service independently of the rest of the xmbc code. i'd also argue that would have the added benefit of compartmentalisation, too, in that developers could work on that without impacting the rest of the system, which isn't really possible at the moment.

as to getting involved with this, i'd be happy to join in - but i'll have to spend a bit of time to familiarize myself with the work that's already gone on here Smile
Reply
#23
Why not consider using a NoSQL database like CouchDB?
  • Everything is stored as an JSON document, no need to maintain a strict schema
  • This allows new add-ons to leverage the database without making structural changes
  • is fully accessible with web-service calls
  • is built to be used in offline mode
  • is multi-master with replication so every end-point can have a copy of the database eliminating single points of failure
-
Reply
#24
Because it's not embeddable.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#25
(2012-05-31, 06:23)jmarshall Wrote: Because it's not embeddable.
Forgive the ignorance - but what do you mean by 'embeddable'?
Reply
#26
Able to be used from within an executable, without a separate server installation. Like sqlite3 is an embeddable SQL database. Similarly, the closest embeddable thing to a nosql database is something like Kyoto Cabinet (it's an embeddable key/value store)

CouchDB for example relies on erlang being installed, which is not a workable solution. Most other NoSQL databases are similar.

IMO if we go that route, we it might be best to just use an SQL table as the document store (eg store JSON or similar) with index tables for the relationships. This would require quite a bit of thought to get completely right, however.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#27
There are solutions out there for accessing SQL databases as file systems (ie, they are exposed as File Like Objects with open(), read(), and the whole 9). They're all uselessly slow, but it's been done =)

MongoDB claims to fit the bill, but I've never used it myself =(
Reply
#28
id suggest to use Neo4j which can be embedden in python. Built in java thou but its really fast and extremeley flexible.

-rog
-gob

Image
Reply
#29
How about just (documented) interface? And by default use embedded sqlite "interface".
Then it should not be problem to build non-sql/sql or even ldap interface to interact with data.
Though interface need to be much higher level stuff, so is this already something which might be more similar with http RPC API or even extend RPC API for this?
Reply

Logout Mark Read Team Forum Stats Members Help
Help to rewrite the whole SQL Database?2