[RFC] Database Schema 4.0 - Request for Comments (Developers and DB Admins only!)
#1
Lightbulb 
Revision: 2040h - 29 May 10

As mentioned some weeks ago we have been busy working on a completely new DB schema to replace the current one (which has served as well to date). The new schema is being designed to be both scalable and adaptable to whatever the future can throw at it whilst remaining as stable and portable as possible (ie it has to play nice with mysql too Laugh).

The current state of the schema (which we are currently most happiest with) is being posted for a community review to ensure any potential "gotchas" are caught and addressed early.

I'll keep the first few posts up to date with any changes we identify along the way, with the latest revision on top. Please keep all posts focussed and on topic, remember this is the DEV forum.

To easy reading I've moved the actual description to the following Wiki page http://wiki.xbmc.org/index.php?title=Dat...Schema_4.0
Reply
#2
1. All movies
Code:
SELECT content.idContent FROM content
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=content.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  WHERE classification.name='Movies';

2. All movies grouped by genre (i.e. all movie genres)
Code:
SELECT DISTINCT(attributeslinkcontent.value) FROM content
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=content.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  INNER JOIN attributeslinkcontent ON attributeslinkcontent.idContent=content.idContent
  INNER JOIN attributes ON attributes.idAttribute=attributeslinkcontent.idAttribute
  WHERE (classification.name='Movies' AND attributes.name='Genres');

3. All movies from a specific genre (e.g. Action)
Code:
SELECT DISTINCT(content.idContent) FROM content
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=content.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  INNER JOIN attributeslinkcontent ON attributeslinkcontent.idContent=content.idContent
  INNER JOIN attributes ON attributes.idAttribute=attributeslinkcontent.idAttribute
  WHERE (classification.name='Movies' AND attributes.name='Genres' AND attributeslinkcontent.value='Action');

4. All shows
Code:
SELECT content.idContent FROM content
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=content.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  WHERE classification.name='Tv Shows';

5. All unwatched shows for the user "masterprofile" (i.e. idProfile==1)
NOTE: This would likely be made more efficient by updating the show and season records per episode viewing.
Code:
SELECT DISTINCT(tvshow.idContent) FROM content AS tvshow
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  WHERE (classification.name='Tv Shows' AND tvshow.idContent NOT IN (
    SELECT DISTINCT(tvshow.idContent) FROM content AS tvshow
      INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
      INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
      LEFT JOIN contentlinkcontent AS tvshowlinkseason ON tvshowlinkseason.idParent=tvshow.idContent
      INNER JOIN content AS season ON season.idContent=tvshowlinkseason.idChild
      LEFT JOIN contentlinkcontent AS seasonlinkepisode ON seasonlinkepisode.idParent=season.idContent
      INNER JOIN content AS episode ON episode.idContent=seasonlinkepisode.idChild
      LEFT JOIN settings ON settings.idContent=episode.idContent
      WHERE (classification.name='Tv Shows' AND (settings.idProfile=1 AND settings.playCount>0))));

6. Seasons for a particular show (e.g. Futurama)
Code:
SELECT season.season FROM content AS tvshow
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  LEFT JOIN contentlinkcontent ON contentlinkcontent.idParent=tvshow.idContent
  INNER JOIN content AS season ON season.idContent=contentlinkcontent.idChild
  WHERE (classification.name='Tv Shows' AND tvshow.title="Futurama");

7. All watched episodes from season 2 and 5 from a particular show.
Code:
SELECT episode.idContent FROM content AS tvshow
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  LEFT JOIN contentlinkcontent AS tvshowlinkseason ON tvshowlinkseason.idParent=tvshow.idContent
  INNER JOIN content AS season ON season.idContent=tvshowlinkseason.idChild
  LEFT JOIN contentlinkcontent AS seasonlinkepisode ON seasonlinkepisode.idParent=season.idContent
  INNER JOIN content AS episode ON episode.idContent=seasonlinkepisode.idChild
  LEFT JOIN settings ON settings.idContent=episode.idContent
  WHERE (classification.name='Tv Shows' AND (settings.idProfile=1 AND settings.playCount>0) AND (episode.season=2 OR episode.season=5));
Reply
#3
Reserved. Enjoy.
Reply
#4
Looks really good, however:


Quote:9. Content-content linking enables connections such as:
Tv Show > Season/Episode
Movie > Soundtrack
Music > Music Video
Home Movies > Photos
9.1. The mapping is done by the parent/child link table "contentlinkcontent".
9.2. The largest linkage depth will typically be 3 (e.g. item 9.1. above).


9.2: Don't agree. ex a Move can have TV Show that have a spinnof TV Show that have have a Movie wich have a soundtrack and that Movie can have a Remake that have a TV-Show and so on (ok, kind of extreem example but just to point out)

So why stop there? Make it a recursive-table and let the user/system decide of how deep the linking can get. Not the Database.

Notes:
Quote:Date/Time strings are stored as VARCHAR(24) (eg. "YYYY-MM-DD HH:MM:SS.SSSS")
Why? this will be a mess when you add locales, why not store it as UNIXTIME/UNIXMICROTIME (ie INT/BIGINT) and remove all future trouble with reparsing datestamps


EDIT:
2. Arbitrary classification "nodes" of media.
Would be great with some kind age rating, ex R- X-.
Reply
#5
ztripez Wrote:Why? this will be a mess when you add locales, why not store it as UNIXTIME/UNIXMICROTIME (ie INT/BIGINT) and remove all future trouble with reparsing datestamps

I agree with this point. Why store the date/time as a formatted string? A Unix Date/Time stamp just seems so much easier to work with when converting to different locales. Unless keeping it as a formatted string will make searching the database easier in some way, I've never worked much in searching time/date strings in a database so this may be the case.
Reply
#6
Why not use SQLite's DATE data type? Is it because you need more precision (i.e. micro-seconds)?
kryptonyte Wrote:I agree with this point. Why store the date/time as a formatted string? A Unix Date/Time stamp just seems so much easier to work with when converting to different locales. Unless keeping it as a formatted string will make searching the database easier in some way, I've never worked much in searching time/date strings in a database so this may be the case.
Reply
#7
deathinator Wrote:Why not use SQLite's DATE data type? Is it because you need more precision (i.e. micro-seconds)?

I'd say that there shouldn't be any use of database system specific types to be able to use it on as many different RDBMS's as possible.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#8
@firnsy: Nice work. Agreed with 9.2 - no reason to limit it necessarily, though in some senses a "parent/child" relationship may not necessarily be the right way to go for the case that ztripez suggests. eg it makes sense for the movie or soundtrack associated with a show to be listed when you click on the show (i.e. it would appear at the same level as seasons) as the show in this case is effectively a container object rather than a media item in and of itself. But it doesn't really make sense to list the tvshow alongside a movies listing - instead, you'd perhaps have a "related items" section in the movie information dialog. The concept of "related items" however is less strict than the child/parent relationship between shows, seasons and episodes. eg it might make sense to have movies by the same director listed as related items to a particular movie, but this is a looser connection of movies in the same set. I suspect there's 2 levels of association - one being association via a category value (director/artist/genre etc.) and one being a more direct association (sequels, sets, soundtrack, tvshow links etc.) Perhaps the contentlinkcontent would be to handle the direct association, and the "indirect" association would be something that is handled in a more general way by searching for items that meet some similarity conditions?

Is your db test harness something that others could find useful do you think?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#9
ztripez Wrote:9.2: Don't agree. ex a Move can have TV Show that have a spinnof TV Show that have have a Movie wich have a soundtrack and that Movie can have a Remake that have a TV-Show and so on (ok, kind of extreem example but just to point out)

So why stop there? Make it a recursive-table and let the user/system decide of how deep the linking can get. Not the Database.

Valid point. Not yet sure of the best way to implement this recursive lookup though. I don't believe the DB backends can handle recursive lookups, so it will have to be implemented in code which would mean repetitive DB calls (and consequently wasted time).

I agree this relationship needs improvement, however how to represent the data struct in a relational database is another question.

jmarshall Wrote:Perhaps the contentlinkcontent would be to handle the direct association, and the "indirect" association would be something that is handled in a more general way by searching for items that meet some similarity conditions?

This is a nice alternative as well. I think this one can be brainstormed a bit more to get the most elegant (read efficient) solution.

ztripez Wrote:Why? this will be a mess when you add locales, why not store it as UNIXTIME/UNIXMICROTIME (ie INT/BIGINT) and remove all future trouble with reparsing datestamp

Noted. A 64 int representing UNIX time is probably better here. Although locales appears to be a bit of a moot point unless you're home network spans timezones.


ztripez Wrote:2. Arbitrary classification "nodes" of media.
Would be great with some kind age rating, ex R- X-.

You will be able to not only your define own classification nodes but also your key/value attributes to all content as well. That's our Genres, Directors, Studio's, etc will be implemented out of the box.


Montellese Wrote:I'd say that there shouldn't be any use of database system specific types to be able to use it on as many different RDBMS's as possible.
Exactly.


jmarshall Wrote:Is your db test harness something that others could find useful do you think?

Possibly yes. If people want to generate alternative (ie optimised/faster) queries than those displayed then it would be good to be working of a similar base. I'll clean up the python script before publishing.
Reply
#10
cool stuff,
As your in the schema planning stages I thought I would mention... Music DVDs
I'm not sure if your schema is flexible to handle these already
Some relate directly to an music album, some not. They arn't really music videos per se
Hopefully these can be incorporated into the new schema and library in the future

I know this isnt directly schema related, sorry for cluttering the thread
Reply
#11
Just a question, how this schema relates to fanarts, poster/banner ecc?
To be clear is it possible to use the same db with different views(on different boxes)?
like:
box 1: tv show with posters
box 2: same tv show with banners

box 1 & 2 take data from the same remote db.

ZIOLele
Reply
#12
Yes, different aspect thumbnails can potentially be stored in the same database. This doesn't have much to do with assessing the schema at hand.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#13
Hi,
some remarks if I may:
  • content table:

    Why not move some of the fields like plot etc into the attribute table?
    Is it so that single-value fields are placed directly in the content table (plot, onlineRating etc) and multi-value fields (actors, directors) are in the attributes table? Because logically they are all "attributes".

    Also an "onlineSource" field would be useful, so that the "onlineId" has some context (for example onlineSource="IMDB" onlineId="tt123456" and onlineSource="TMDb" onlineId="987654").


  • attribute table:
    A "detail" field might be useful for adding context specific info where is applicable, like the character name of an actor.

    Also the url's for any content could be here as well (and remove the epgUrl from content table)


  • contentlinkcontent table:
    Perhaps this can serve better:
    Code:
    CREATE  TABLE IF NOT EXISTS contentlinkcontent (
      idSource  INTEGER  NOT NULL  DEFAULT 0,
      idTarget   INTEGER  NOT NULL,
      idRelation   INTEGER  NOT NULL,
    );
    Where "idRelation" can be picked from a fixed or expandable list like "Parent/Child", "Soundtrack", etc
    This way you can add any type of relationship, for example you could encapsulate the set context there too - just store the set in the "content" table, add the set contents as rows in this table and discard the "sets" & "setlinkcontent" tables.



If I understand correctly general info for a tvshow will be stored in the content table. Is that correct? Shouldn't there be an "artist" table as well?

Is this proposal strictly for video files or it'll be expanded for music & photos as well? I'm thinking that by generalizing a bit the content table and adding the proper attributes in attribute table, any type of media could fit here.
Reply
#14
Thanks ph77. Agreed that "set" makes some sense to have in the contentlinkcontent table. Perhaps not simply because there's a relationship there (after all, one could think of "genre" in the same light) but rather because the set may itself have some metadata information in a similar way that the movies within the set has. eg there may be an overall plot or description, some images associated with it, or some other details such as history and so on. In the same way that a tvshow is content even though it's a collection of episodes, I think sets also fit in here.

I think we have to be careful about making things too general, however. For instance, there's no reason why we can't just have a "values" table and a "link" table and be done with it:
Code:
"Values"

1 | Movie | Die Hard
2 | Character | John McClane
3 | Actor | Bruce Willis
4 | Quote | Yippee-ki-yay Mother F**ker

"Link"
1 | Character | 2
2 | Actor | 3
3 | Quote | 4
However, I think it's clear that there's large enough differences between a character and an actor and an actor and a movie/show to allow some specialisation in the database scheme. Also, the above generalisation automatically means recursive lookups through the link table, which is likely nasty performance-wise.

So perhaps we could constrain things somewhat into broader categories such as:

1. Media (movies, episodes, music videos, songs, pictures)
2. Collections (sets, seasons, shows, albums, picture albums, album sets)
3. People (artists, actors, directors, writers, Mum, Dad etc.)
4. Characters (a Media/Collections<->People link table)
5. Groups (bands, families) ?

All of the above have attributes, and the attributes may be specific enough that we could use a flat table or a key/value attributes table or a combination thereof.

I'm not sure about "Groups" - this may be a waste of time to attempt, given that there's no mechanism for identifying groups versus people in the main place where this would be useful (music).

Thoughts?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#15
ph77 Wrote:Why not move some of the fields like plot etc into the attribute table?
Is it so that single-value fields are placed directly in the content table (plot, onlineRating etc) and multi-value fields (actors, directors) are in the attributes table? Because logically they are all "attributes".

Yes logically everything is an attribute, but there has to be a trade off between everything being an attribute (lot's of recursive SQL) and common values being part of the table whilst others attributes.

The aim here is to identify the best trade-off between table size and query size.

ph77 Wrote:Also an "onlineSource" field would be useful, so that the "onlineId" has some context (for example onlineSource="IMDB" onlineId="tt123456" and onlineSource="TMDb" onlineId="987654").

Nice. Context is definitely important here, particularly when if we want to rescrape/rebuild the information at a later date.

ph77 Wrote:*attribute table:
A "detail" field might be useful for adding context specific info where is applicable, like the character name of an actor.

Good idea. Context is nice when you start adding your own custom attributes.

ph77 Wrote:*contentlinkcontent table:
Perhaps this can serve better:
Code:
CREATE  TABLE IF NOT EXISTS contentlinkcontent (
  idSource  INTEGER  NOT NULL  DEFAULT 0,
  idTarget   INTEGER  NOT NULL,
  idRelation   INTEGER  NOT NULL,
);
Where "idRelation" can be picked from a fixed or expandable list like "Parent/Child", "Soundtrack", etc
This way you can add any type of relationship, for example you could encapsulate the set context there too - just store the set in the "content" table, add the set contents as rows in this table and discard the "sets" & "setlinkcontent" tables.

I like this as well, in additional to jmarshall following comments. The sets could be rolled into this relationship as long as other common (ie. "TvShow/Episode" or "Parent/Child") linkages.


ph77 Wrote:If I understand correctly general info for a tvshow will be stored in the content table. Is that correct? Shouldn't there be an "artist" table as well?

Is this proposal strictly for video files or it'll be expanded for music & photos as well? I'm thinking that by generalizing a bit the content table and adding the proper attributes in attribute table, any type of media could fit here.

The schema is initially focusing on video. If we get this right then ideally we can add their requirements with minimal effort and negligible cost of query times. I'll update the schema over the next few day or two in line with the aforementioned comments.

Thanks for the constructive feedback.
Reply

Logout Mark Read Team Forum Stats Members Help
[RFC] Database Schema 4.0 - Request for Comments (Developers and DB Admins only!)1