[RFC] Database Abstraction Proposal - Request for Comments (Devs and DB Admins only!)

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
firnsy Offline
Team-XBMC Developer
Posts: 104
Joined: Jul 2009
Reputation: 0
Location: Australia
Post: #11
wstewart Wrote:As I mentioned in the other thread (http://forum.xbmc.org/showthread.php?tid=37449), there is already an abstraction layer in XBMC, so I am not sure that OpenDBX is really necessary. The existing code has hooks to pass the ip address and user id/password to the database wrapper layer (currently sqlite).
You are right there are two ways this could be approached, either add the backend support to the existing dbiplus implementation, or use a new one which in this case is OpenDBX.

I'm still leaning towards OpenDBX because of its large backend support and that it is still in active development. If we find the cost of replacing dbiplus is greater than adding MySQL and PostgreSQL support then of course this will be reconsidered.

I see the final implementation will be laid down in phases to allow a smooth-ish transition from what we have now in multiple local databases for media, to a single local database for media, to a single database that can be stored locally or remotely. So the library replacement is actually in the final phase (albeit the larger one).

wstewart Wrote:My plans are/were to expand this current layer to include mysql, however, conversion of my system from mythtv to XBMC has taken much longer than expected, so I haven't had time to work further on it. So if you beat me to it, that would be great.

Hurry up then ... This won't happen overnight and the more people working on it the more robust it will be Laugh
find quote
firnsy Offline
Team-XBMC Developer
Posts: 104
Joined: Jul 2009
Reputation: 0
Location: Australia
Post: #12
ZIOLele Wrote:I haven't looked at the schemas yet, so i can be wrong, but this duplication is really necessary(if it is duplication)? can't this be solved whit an intertable (if necessary)?

This was not meant to be alarming as there is only two table names that conflicts at the moment being "path" and "version", with the version table already addressed in the proposal.

skunkm0nkee Wrote:Or a master media table that contains all the similarities and then separate child tables (music_*, video_* etc) for the media specific columns?

This is a pretty good suggestion. Have a consolidated "path" table which essentially contains:
  • idPath
  • sPath
  • sHash

This will satisfy things on the music side already and in addition have a video_path table with the additional fields. The "video_path" table can than be added to the views as appropriate.
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 24,523
Joined: Oct 2003
Reputation: 138
Post: #13
Some thoughts. Apologies - it's a bit of a braindump Tongue

Short form for those that can't be arsed reading it all: How paths are defined per client is the key piece of information that needs to be determined.

Long form for those with time on their hands:

1. Consolidation is really just for the benefit of not having multiple databases on the server. There is very little need to cross-link the 3 databases we have at the moment, and the small amount of cross linking that might be wanted can easily be done externally. Music videos naturally belong in the music database rather than the video database, though there are ofcourse technical issues in getting them in there (lack of tagging a major one, and also having to make sure the player system supports them well.)

2. The big problem is dealing with paths and profiles as already discussed above. This is a non-trivial problem. Ideally, we want to be able to just point XBMC at a database and we're done. We don't want to have to care that the paths may be different and so on. Thus, translation of paths needs to be either done in the database or in the client. It seems to me that doing this in the client makes the most sense, though I currently have no solid ideas on how to handle this. It's not as simple as a copy and paste for several reasons - rar/zip/stacks/multipaths etc. all need handling, and these are not naturally hierarchical. Multipaths are easy (just store multiple rows to define a 'node'), and stacks are probably doable as well. Rars and Zips are tricky, as theoretically you can have arbitrary numbers of movies within rars which may be within a zip or whatever. This is something that needs solving prior to finalizing how paths and urls are stored. Perhaps the solution is to ignore it and force everything to be on a network drive that allows the same URL format for everything, or perhaps we just tag those paths that will be the same for all machines (i.e. network) - I don't know the best way to handle it.

3. Metadata has to be completely separate from file data. The link has to be very carefully setup. My thinking for the video library is as follows:

a. A "video" table that basically contains a filtered version of the filesystem. Each entry has a URL to the item on the filesystem that this entry covers (could be a folder or a file, a stack of files, or part of a file), along with the video URL that should be played when the item is clicked on, along with some other pertinent info, such as pathid (path that this item resides in), nodeid ("node" or source that this item resides in), a hash to enable rescraping and so on. Each entry represents a playable item.

b. A "filedetails" table would have information on the underlying file(s) for that item, such as the streamdetails and other file-based information (size, date, etc.) This is stuff that can be removed when the "video" item is removed, as it's easily generated from the file.

c. A "metadata" table handles all the metadata for this item. This is scraped info and local info. It'll be fixed field and will also have key/value pairs. This is information that may be scraped from online sources or local sources, and thus should not be removed if possible. It may be split into two, as I see a use for a single-row of information that gets key details needed for sorting (eg title, sorttitle, originaltitle, rating, runtime - one off stuff) and additional information used for grouping (genre, group, actor, director, studio, year, <insert_user_definable_grouping> etc.)

d. Multiple "video" items may link to a "metadata" item, so a video->metadata link can be made. It's a n:1 relationship though I think, so a foreign key in the video table to the metadata table will suffice.

The idea is that the video table allows for fast listing based on path or fast listing based on node or source (eg "all movies"). Once we have the listing, we then background load the rest of the information (one query per entry).

Cheers,
Jonathan

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
wstewart Offline
Senior Member
Posts: 213
Joined: Jan 2009
Reputation: 1
Location: Canada
Post: #14
firnsy Wrote:Hurry up then ... This won't happen overnight and the more people working on it the more robust it will be Laugh

Smile I think you are much further head at this point and I have much to do before I can look at this more. I had hoped to do the coding last month, but getting XBMC up&running and my video collection moved over has taken so much longer than anticipated.

I have too many jobs on the go, I have to complete the transition to XBMC for the waf, install a new intake on my mustang again for waf since the current one is cracked, work on finishing the basement before year end and spend time with my daughter before she grows up too fast.....

So please don't wait for me, I will jump in and help when I can. I will certainly be on board to test the new code, I have a mysql backend running previously used by mythtv that I plan to re-use for XBMC.

Thank you so much for jumping in on this one...
find quote
ZIOLele Offline
Senior Member
Posts: 130
Joined: Oct 2008
Reputation: 0
Post: #15
jmarshall Wrote:Some thoughts. Apologies - it's a bit of a braindump Tongue

Short form for those that can't be arsed reading it all: How paths are defined per client is the key piece of information that needs to be determined.

Long form for those with time on their hands:

1. Consolidation is really just for the benefit of not having multiple databases on the server. There is very little need to cross-link the 3 databases we have at the moment, and the small amount of cross linking that might be wanted can easily be done externally. Music videos naturally belong in the music database rather than the video database, though there are ofcourse technical issues in getting them in there (lack of tagging a major one, and also having to make sure the player system supports them well.)

2. The big problem is dealing with paths and profiles as already discussed above. This is a non-trivial problem. Ideally, we want to be able to just point XBMC at a database and we're done. We don't want to have to care that the paths may be different and so on. Thus, translation of paths needs to be either done in the database or in the client. It seems to me that doing this in the client makes the most sense, though I currently have no solid ideas on how to handle this. It's not as simple as a copy and paste for several reasons - rar/zip/stacks/multipaths etc. all need handling, and these are not naturally hierarchical. Multipaths are easy (just store multiple rows to define a 'node'), and stacks are probably doable as well. Rars and Zips are tricky, as theoretically you can have arbitrary numbers of movies within rars which may be within a zip or whatever. This is something that needs solving prior to finalizing how paths and urls are stored. Perhaps the solution is to ignore it and force everything to be on a network drive that allows the same URL format for everything, or perhaps we just tag those paths that will be the same for all machines (i.e. network) - I don't know the best way to handle it.

3. Metadata has to be completely separate from file data. The link has to be very carefully setup. My thinking for the video library is as follows:

a. A "video" table that basically contains a filtered version of the filesystem. Each entry has a URL to the item on the filesystem that this entry covers (could be a folder or a file, a stack of files, or part of a file), along with the video URL that should be played when the item is clicked on, along with some other pertinent info, such as pathid (path that this item resides in), nodeid ("node" or source that this item resides in), a hash to enable rescraping and so on. Each entry represents a playable item.

b. A "filedetails" table would have information on the underlying file(s) for that item, such as the streamdetails and other file-based information (size, date, etc.) This is stuff that can be removed when the "video" item is removed, as it's easily generated from the file.

c. A "metadata" table handles all the metadata for this item. This is scraped info and local info. It'll be fixed field and will also have key/value pairs. This is information that may be scraped from online sources or local sources, and thus should not be removed if possible. It may be split into two, as I see a use for a single-row of information that gets key details needed for sorting (eg title, sorttitle, originaltitle, rating, runtime - one off stuff) and additional information used for grouping (genre, group, actor, director, studio, year, <insert_user_definable_grouping> etc.)

d. Multiple "video" items may link to a "metadata" item, so a video->metadata link can be made. It's a n:1 relationship though I think, so a foreign key in the video table to the metadata table will suffice.

The idea is that the video table allows for fast listing based on path or fast listing based on node or source (eg "all movies"). Once we have the listing, we then background load the rest of the information (one query per entry).

Cheers,
Jonathan

1. Having to deal with only one db makes things easier, i think. I was just thinking ( maybe i'm going a little too far or OT on this, but it seems to me a good idea so i want to share it Big Grin) Why don't implement a mechanism similar to the one implemented for strings handling, but for queryes? this way we can modify the db schemas without touching the source code (which would use an id to identify the query to use) and also we can easily mantain the queryes (they would be in a separate xml file and not in the source code). plus, maybe in a distant future, we can extend this to skinners so they can implement user defined queryes all across the db and show the result wherever they want ( no more dirty hacks to retrieve information).

2. I think we should go down the uri road and define (if not already defined) an uri to identify the local resources (eg: file://<source_name>) if i'm not wrong we already have something similar for smb:// and upnp.
For rar and zip handling maybe a function that recursively check if the resource is a media playable and if so start playing, otherwise relaunch itself on the new object collection.

3. i think we can extend the same concept also for music and program. I was thinking to try and merge (and normalize) the three dbs. As soon as i've something presentable i will post it (just give me some time... lots of things to do :-D).
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 24,523
Joined: Oct 2003
Reputation: 138
Post: #16
1. I was merely pointing out that the data in the 3 current libraries is primarily independent, thus the only reason to join them is convenience rather than to cross-query.

2. We already support file://, and similarly already have URL constructs for rar:// and zip:// and the like. I'm not sure how this helps anything. At a broader scale, the whole idea of which client updates the database for scans etc. is something to think about. It doesn't really make a lot of sense for each client connected to a central db to run checks for new content - this is a role the central server should take, right?

3. I see little point in working on multiple libraries until one has been proven well-defined and working, but don't let that stop you Smile

Cheers,
Jonathan

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
ZIOLele Offline
Senior Member
Posts: 130
Joined: Oct 2008
Reputation: 0
Post: #17
jmarshall Wrote:1. I was merely pointing out that the data in the 3 current libraries is primarily independent, thus the only reason to join them is convenience rather than to cross-query.

2. We already support file://, and similarly already have URL constructs for rar:// and zip:// and the like. I'm not sure how this helps anything. At a broader scale, the whole idea of which client updates the database for scans etc. is something to think about. It doesn't really make a lot of sense for each client connected to a central db to run checks for new content - this is a role the central server should take, right?

3. I see little point in working on multiple libraries until one has been proven well-defined and working, but don't let that stop you Smile

Cheers,
Jonathan

1. I know that was only an idea.

2. yes it should be the central server or rather a server process that checks periodically for update and update the db (which may be located in another server). We can also use the client to update the db directly, but in this case there is concurrency to take in account.

3.that's the "problem" i think we should refer to the three libraries as "media" and then define a well formed and normalized db to accomodate all type of media, rather than approaching them separatedly.
find quote
ZIOLele Offline
Senior Member
Posts: 130
Joined: Oct 2008
Reputation: 0
Post: #18
Uhm, is the video db schema complete? Where are stored the scraped information?
find quote
firnsy Offline
Team-XBMC Developer
Posts: 104
Joined: Jul 2009
Reputation: 0
Location: Australia
Post: #19
jmarshall Wrote:1. I was merely pointing out that the data in the 3 current libraries is primarily independent, thus the only reason to join them is convenience rather than to cross-query.

I think we all concur on this point, it is a convenience rather than necessity. A convenience which can yield some nice side effects.

jmarshall Wrote:Perhaps the solution is to ignore it and force everything to be on a network drive that allows the same URL format for everything, or perhaps we just tag those paths that will be the same for all machines (i.e. network) - I don't know the best way to handle it.

Building on this idea a little further, would giving each XBMC instance in the network a unique name (resolvable by some means) of which becomes part of the URI.

For example -----
I have 3 XBMCs in my network, named XBMC-A, XBMC-B and XBMC-C respectively. I have also have a 'nix server, named MEDIA-SRV, which hosts a samba share and is serving a common mysql database.

XBMC-A is the only instance with local media on "E" drive, which consists of a couple of movies wrapped up in a rar archive as well as some individual music files. All XBMCs are configured to use the common media repository at smb://MEDIA-SRV/media/

Now, let's say that XBMC-B browses the database and wants to play a movie that's contained in the rar archive. The database indicates the URI for the path is "rar://XBMC-A/E/movies.rar/movie1.avi"
End example -----

This would indicate an absolute URI with protocol/type (eg. file, smb, rar), host (eg. hostname, devicename, or even static IP) and the path.

jmarshall Wrote:At a broader scale, the whole idea of which client updates the database for scans etc. is something to think about. It doesn't really make a lot of sense for each client connected to a central db to run checks for new content - this is a role the central server should take, right?

Yes multiple clients do not want be updating a single database at once. I see two reasonable ways of dealing with this.

1. The ability to configure a master database updater of which only they can update the database and scraping is simply disabled on other instances. This could be a little limiting.

2. Have a pseudo lock table in the database that indicates when the database is being updated and by whom, other fields such as last updated could also be warranted. When an XBMC instance wishes to update the database they would have to obtain a lock before proceeding. This would allow all instances in a network to potentially update the database provided they have the lock.

ZIOLele Wrote:Uhm, is the video db schema complete? Where are stored the scraped information?

If you are referring to the picture the "cXX" fields in "movie", "tvshow", "episode" and "musicvideo" tables actually represents the fields "c00" through to about "c20" in the real tables. It was merely compressed to aid layout.

Hopefully you're not too confused.
find quote
ZIOLele Offline
Senior Member
Posts: 130
Joined: Oct 2008
Reputation: 0
Post: #20
firnsy Wrote:If you are referring to the picture the "cXX" fields in "movie", "tvshow", "episode" and "musicvideo" tables actually represents the fields "c00" through to about "c20" in the real tables. It was merely compressed to aid layout.

Hopefully you're not too confused.

No, i'm not too confused tanks :-D
So, if i understand it clearly all the scraped infos are in this c00..c20 columns right? (i'm using your image to understand the db schema since I don't have a tool to explore the actual dbfiles.)
find quote
Post Reply