[RFC] Database Schema 4.0 - Request for Comments (Developers and DB Admins only!)
#16
jmarshall Wrote:So perhaps we could constrain things somewhat into broader categories such as:

1. Media (movies, episodes, music videos, songs, pictures)
2. Collections (sets, seasons, shows, albums, picture albums, album sets)
3. People (artists, actors, directors, writers, Mum, Dad etc.)
4. Characters (a Media/Collections<->People link table)
5. Groups (bands, families) ?

All of the above have attributes, and the attributes may be specific enough that we could use a flat table or a key/value attributes table or a combination thereof.
firnsy Wrote:The schema is initially focusing on video. If we get this right then ideally we can add their requirements with minimal effort and negligible cost of query times.

Regarding expanding the schema for all media types, I can see two options:
  • Put all content in content table and add a "mediaType" field, which can be specific enough like "movie, episode, movie set, series, album, track, etc". Games could be added easily here too.
    Common fields between media types can be identified and have different meaning according to the "mediaType".
    For example "episode" and "season" fields can be changed to something like "partNumber" and "setNumber". For episodes these will be the episode & season numbers and for audio tracks the track and disc number (which I miss badly).


  • Create separate videocontent, audiocontent, picturecontent, gamecontent tables.
    This breaks the uniqueness of idContent, but it could be corrected by having a global sequence for obtaining a unique idContent to use in the various content tables.
    Perhaps this is more straightforward solution. Specific media attributes can be stored directly into these tables and minimizing the recursive lookups.
Reply
#17
Also regarding a "People" table and its connection to content, perhaps the "attributeslinkcontent" table is not enough, since the "value" field is TEXT.
So the "attributeslinkcontent" table could be split in two: One (as it is now) for generic attributes like genre, studio, etc and one for "people" attributes, where the value is the numeric id from a "people" table.
Reply
#18
Looking at this monster:
Code:
SELECT DISTINCT(tvshow.idContent) FROM content AS tvshow
  INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
  INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
  WHERE (classification.name='Tv Shows' AND tvshow.idContent NOT IN (
    SELECT DISTINCT(tvshow.idContent) FROM content AS tvshow
      INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
      INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
      LEFT JOIN contentlinkcontent AS tvshowlinkseason ON tvshowlinkseason.idParent=tvshow.idContent
      INNER JOIN content AS season ON season.idContent=tvshowlinkseason.idChild
      LEFT JOIN contentlinkcontent AS seasonlinkepisode ON seasonlinkepisode.idParent=season.idContent
      INNER JOIN content AS episode ON episode.idContent=seasonlinkepisode.idChild
      LEFT JOIN settings ON settings.idContent=episode.idContent
      WHERE (classification.name='Tv Shows' AND (settings.idProfile=1 AND settings.playCount>0))));

You will basically execute the largest of the SELECT DISTINCT queries for every single entry in the content table that has the classification name 'Tv Shows'. In most cases this should be fine as you only have what 20 or so TV shows but now try and do this with music and a rather large library... you could easily be staring at the system for seconds before anything happens.

I would say do away with the sub query I would love to remove the select distinct as well but without a database and some data I cannot test that properly. The sub query seems to only grab all the TV shows that have been played, simply changing that to have not been played should do the trick already. The query would then look like this:

Code:
SELECT DISTINCT(tvshow.idContent) FROM content AS tvshow
      INNER JOIN classificationlinkcontent ON classificationlinkcontent.idContent=tvshow.idContent
      INNER JOIN classification ON classification.idClassification=classificationlinkcontent.idClassification
      LEFT JOIN contentlinkcontent AS tvshowlinkseason ON tvshowlinkseason.idParent=tvshow.idContent
      INNER JOIN content AS season ON season.idContent=tvshowlinkseason.idChild
      LEFT JOIN contentlinkcontent AS seasonlinkepisode ON seasonlinkepisode.idParent=season.idContent
      INNER JOIN content AS episode ON episode.idContent=seasonlinkepisode.idChild
      LEFT JOIN settings ON settings.idContent=episode.idContent
      WHERE (classification.name='Tv Shows' AND (settings.idProfile=1 AND settings.playCount<1))));

Or am I completely crazy? Confused

Personally I don't like the INNER/LEFT JOIN but thats just me growing up with databases that are able to optimize them selfs a lot better then humans can in 99% of the cases. But thats just me... Certainly in larger statements I feel it reduces the readability of the query an awful lot, though that can be just a force of habit where I have been writing code without explicit JOINS for such a long time that I have developed an aversion to the JOIN statement Oo
Reply
#19
Just want to add that it would be nice to be able to have "single media" and "series" together in one category.

So in Documentaries, you could have a single documentary like March of the Penguins next to a Discovery Channel nature series that would be on TVDB...

Or in Concerts, you could have DVD ISO's of individual concerts as their own entries, as well as the Live from Abbey Road TV series...

Smile
Kodi: Kodi 17.4, with Transparency!
50 TB Unraid Server: Docker Apps: SABnzbd, Sickrage, mariaDB
HTPC: Win10 (cause Steam), i7, GTX 1080
Watching on: Panasonic TC65-PS64 with lowend Sony 5.1 HTIB
Other devices: rMBP 15", MBA 13", nvidia shield
Reply
#20
If we do separate out "content" from "collections" (as IMO it's good to do - they're fundamentally different things) then in the content table we could have the parentID field which would make the queries a little smaller (no need to hit the attributes table) though ofcourse we'd still have recursion inside the collections table - nothing can be done about that unless we restrict the maximum recursion level.

create table content(idContent, idType, idParent, title, ...)
create table collections(idCollection, idType, idParent, title, ...)

so to get all content under a collection "rootID" recursively, you'd append rootID to a parent_list, then recursively hit the collections table to get all collections where parentID in parent_list. Given that collections is much smaller than content, this should be reasonably fast. You'd then have 1 hit to the content table with parentID in parent_list.

It might actually make sense when we hit collections to generate the tree under them instead - not sure. The main reason we need to do it is to check whether the collection item needs displaying at all - eg if all content is watched then we don't display it, or if the user doesn't have permission to any of the content etc. The other reason is for summary information (number of episodes, tracks, or albums for instance)

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#21
rcoops Wrote:Looking at this monster:

----8< snip >8----

You will basically execute the largest of the SELECT DISTINCT queries for every single entry in the content table that has the classification name 'Tv Shows'. In most cases this should be fine as you only have what 20 or so TV shows but now try and do this with music and a rather large library... you could easily be staring at the system for seconds before anything happens.

This is good stuff, optimising for the benefit of users everywhere. It brings a tear to my eye.

rcoops Wrote:I would say do away with the sub query I would love to remove the select distinct as well but without a database and some data I cannot test that properly.

After updating the schema as previously mentioned, I'll work on tarballing a test dataset using majority of the new schema. However, it doesn't have any watched type information in it as yet. I might have to randomise that data first and then produce the dataset. Either way I'll get it out there somehow and soon, so people can really get funky quick with the queries.
Reply
#22
firnsy Wrote:However, it doesn't have any watched type information in it as yet.

Would it be possible with the new database overhaul to change the watched status from being a simple number, to an array of numbers and usernames?

That way multiple people sharing a database can have their own watched status...
Kodi: Kodi 17.4, with Transparency!
50 TB Unraid Server: Docker Apps: SABnzbd, Sickrage, mariaDB
HTPC: Win10 (cause Steam), i7, GTX 1080
Watching on: Panasonic TC65-PS64 with lowend Sony 5.1 HTIB
Other devices: rMBP 15", MBA 13", nvidia shield
Reply
#23
clock2113 Wrote:Would it be possible with the new database overhaul to change the watched status from being a simple number, to an array of numbers and usernames?

That way multiple people sharing a database can have their own watched status...

Would it be possible that you read the proposed schema and comment on its technical merit instead of posting short feature requests (which are already considered and catered for) in a dev forum?
Reply
#24
Firnsy, I'm writing down a few things discussed earlier in chat, for reference.

First of all, great job on the redesign, it is very flexible and fits every situation I can think of.

The 'path' table is a straight copy from the current db and contains denormalized scraper data ('scraper', 'scanRecursive', 'useFolderNames', 'settings', 'noUpdate'). I think it is a good time to reevaluate the pros and cons of the denormalization and the usage patterns, maybe store paths relative to a source root path so that they work both as local and network paths, ...
Always read the Kodi online-manual, the FAQ and search the forum before posting.
Do not e-mail Kodi Team members directly asking for support. Read/follow the forum rules (wiki).
For troubleshooting and bug reporting please make sure you read this first.
Reply
#25
Agreed that the scraper settings should be pushed to a separate table, and linked from the path - allows update of everything under the current path in 1 query, without breaking any changes half way down the heirarchy (eg shows/show3 has a different scraper than shows, then update the scraper setting on shows).

Having an explicit parentID in the path table might be useful (and/or sourceID if that's also useful).

That way we can usefully go up and down the heirarchy without assuming a heirarchical structure.

Have we come to a decision regarding the breakdown of the content (eg the media/collections/people split)?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#26
jmarshall Wrote:Agreed that the scraper settings should be pushed to a separate table, and linked from the path - allows update of everything under the current path in 1 query, without breaking any changes half way down the heirarchy (eg shows/show3 has a different scraper than shows, then update the scraper setting on shows).

I'll amend the current schema to reflect this shortly.


Having an explicit parentID in the path table might be useful (and/or sourceID if that's also useful).

That way we can usefully go up and down the heirarchy without assuming a heirarchical structure.

jmarshall Wrote:Have we come to a decision regarding the breakdown of the content (eg the media/collections/people split)?

The current implementation for handling custom collections is similar to that provided by ph77 whereby the contentlinkcontent table is generalised for identifying custom relationships (eg Sets, TvShow/Episode, Parent/Child).
Reply
#27
Quote:The current implementation for handling custom collections is similar to that provided by ph77 whereby the contentlinkcontent table is generalised for identifying custom relationships (eg Sets, TvShow/Episode, Parent/Child).

What's the advantage in having shows/sets/seasons/albums/other_collections as content when they don't describe playable items? They're different entities, and likely have different metadata that's important, so it may make better sense to have a separate "collections" table to describe them?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#28
Jmarshall, Firnsy, I didn't mention a 'source' table and sourceid xref from 'path' earlier, as it didn't seem required to port current functionnality.

It would be helpful to easily and safely implement certain improvements though, such as ticket 8556 and a smarter 'Clean Library' wrt NAS (test the top level path of the source - if not available, don't clean paths of the source). That's just off the top of my head. I don't mean to request features in this thread, so I'll stop here Smile
Always read the Kodi online-manual, the FAQ and search the forum before posting.
Do not e-mail Kodi Team members directly asking for support. Read/follow the forum rules (wiki).
For troubleshooting and bug reporting please make sure you read this first.
Reply
#29
jmarshall Wrote:What's the advantage in having shows/sets/seasons/albums/other_collections as content when they don't describe playable items? They're different entities, and likely have different metadata that's important, so it may make better sense to have a separate "collections" table to describe them?

Acknowledged. There is no significant advantage other than generalising the form. On second glance it does start to appear both complex and messy implementing via the contentlinkcontent table.

The only issue I see is what standardised metadata, if any, is required for "collections"? Particularly for the broad use case of video content to music content to event picture content.
Reply
#30
firnsy Wrote:The only issue I see is what standardised metadata, if any, is required for "collections"? Particularly for the broad use case of video content to music content to event picture content.

I too think that perhaps the collections should be separated in a different table, if only for database performance issues.
The metadata varies a lot depending on the type of collection. In music content, the "Album" is a collection but it can have a quite rich set of metadata. So perhaps as you might have videocontent, musiccontent, etc you can have videocollections, musiccollections, etc so that content specific metadata can be specified.

One different issue: Is there is a bit of overlap between "classifications" and "collections", as you can have nested collections? Might be a good idea to merge these concepts.
Reply

Logout Mark Read Team Forum Stats Members Help
[RFC] Database Schema 4.0 - Request for Comments (Developers and DB Admins only!)1