• 1
  • 15
  • 16
  • 17(current)
  • 18
  • 19
  • 20
New DB Structure for all libraries + DBs [update 2015 06 13 / Released SQL 1.3]
Another problem (at least for me) would be the database code file for that one database.
Our musicdatabase.cpp or videodatabase.cpp are already way longer than any file should be in a code base.
Reply
If I might jump into this discussion again. At first I was all in for the one DB for everything approach, but once I talked to Razze and he mentioned micro services, I realized that I actually use this approach myself already and that it IMO would have some benefits. Like, who says that a micro service has to be a core part and couldn't be replaced by an add-on? Just imagine a person micro service that directly grabs up2date actor info from IMDB. Or a service that directly connects to the Emby database instead of the local one?
Splitting things up into services IMO has the benefit, that there HAS TO be a well defined object oriented interface for all DB communication and thus you are free to replace the lower level part with whatever suits your needs (a service could also do remote queries to a central Kodi server instead of local queries). Even if we in the end will store everything in one DB, I still propose to go the (micro) service approach to have small encapsulated interfaces to code against. Nothing in core should ever have to create a query, but rather use some query abstraction language (like smart playlist rules) that the services translate into whatever they use internally.
Code:
std::string actorName = "Matt Damon"; //Some actor name from a user defined search or whatever
Kodi\Model\Person\Actor actor = personRepository->findByName(actorName);
Kodi\Persistence\QueryResult movieList = movieRepository->findByActor(actor);

or if you would like to search the other way around:
Code:
Kodi\Model\Asset\Movie movie movieRepository->findByName("Star Wars");
Kodi\Persistence\QueryResult relatedPersons = personRepository->findByMovie(movie);
// now iterate over the persons, group them by their role and pass it to the view

This is all just pseudo code, but IMO easy to read and follow, even for people that never developed anything. If the related repositories now have their data in one big DB, seperated ones or even some remote service doesn't matter.

The only tricky part to think about is how to generate really complex queries, like with smart playlists, that span accross services/repositories. I don't see any issues with queries that f.e. list all related media by a certain person (like for Jared Leto) as we simply query every asset repository one after another and list the results grouped by type (which makes sense IMO). What we need a solution for are queries against play counts and watched states etc, but I guess we could also prefetch related UUIDs from remote services and use these to filter in the according repo. Might be good if we could do some performance tests with our existing DB (first fetch IDs matchinig play count etc, then do a second query using all aggregated info).
Reply
You can have a well defined object oriented interface even with one database..The main problem is that (if I'm not wrong) 90% of the times kodi loads the views because it needs that infos for sorting and quick filtering (like watched/unwatched). If you split userdata in a different database you'll end up doing the database work via code...
Let's say you load the movie list. You also need watched state and resume bookmarks so you'll have to get the list from movieRepository and then loop in the list, get for every movie the watched count and the resume bookmark from userRepository and attach it to the movie via code manually every single time.
Well that's the DBMS work and I'm pretty sure it will do it faster than we can via code...
But I still agree that an interface like the one you propose should be the final goal and it's more important than the database model. The problem is to find a good idea to encapsulate the complex filtering behind smart playlists with it's flexibility (query only for the data we need for filtering and not for everything) and write a lot of code.. I can help but I doubt it's possible to do this for one person (I can't for sure)
Reply
I performed a very basic test to get an idea of the grade of performance loss.
I used my laptop (with an i7-4710mq cpu) and my database (in sqlite) to simulate a separate behavior with a specific code and I run both cases several time. I don't have a raspberry pi or any low power device so I can't test how it performs with weaker cpus.
In the "A" scenario I got all the fields from movie_view and I parsed them in CVideoInfoTag. In the "B" scenario I got all the fields in movie. Then I looped through the rows and for every row I got files, path and bookmark fields with a second query. In the library there were 493 movies. I used this because I think is the most common query for kodi.
The results are:
The "A" scenario took 149 ms to get 493 movies without details on average.
The "B" scenario took 205 ms to get 493 movies without details on average.
It's 38% worse... And with additional details, more complex queries or weaker cpus it might get worse
Reply
Nice benchmarking!

Certainly something to bear in mind.
Reply
(2015-12-02, 16:25)phate89 Wrote: I performed a very basic test to get an idea of the grade of performance loss.
I used my laptop (with an i7-4710mq cpu) and my database (in sqlite) to simulate a separate behavior with a specific code and I run both cases several time. I don't have a raspberry pi or any low power device so I can't test how it performs with weaker cpus.
In the "A" scenario I got all the fields from movie_view and I parsed them in CVideoInfoTag. In the "B" scenario I got all the fields in movie. Then I looped through the rows and for every row I got files, path and bookmark fields with a second query. In the library there were 493 movies. I used this because I think is the most common query for kodi.
The results are:
The "A" scenario took 149 ms to get 493 movies without details on average.
The "B" scenario took 205 ms to get 493 movies without details on average.
It's 38% worse... And with additional details, more complex queries or weaker cpus it might get worse

That test would be way more interesting in mysql, as I said somewhere above, I'm not sure how sqlite does it, but I think it may be closing/opening the database a bunch of times.
I really don't know, so that would be nice.
Also a scenario "C" would be nice, fire query 1 like in senario "B" and concatenate the ids and do a "WHERE id in (YOUR-IDS-HERE). Then match the result via ids in c++.

Btw, I thought more about it. And I agree, we should merge the audio and video usecases. (or databases)
And if it fits for more, even them. I guess pictures might be possible. (call it media db) Not so sure about tv and epg.
So we don't end up with ONE database, but we combine them where it is smart.

And also wrap them like services Wink
And I would still go for a person service unless we can be very sure that it will only be used by the media db.
Reply
(2015-12-03, 01:00)Razze Wrote: That test would be way more interesting in mysql, as I said somewhere above, I'm not sure how sqlite does it, but I think it may be closing/opening the database a bunch of times.
I really don't know, so that would be nice.
Also a scenario "C" would be nice, fire query 1 like in senario "B" and concatenate the ids and do a "WHERE id in (YOUR-IDS-HERE). Then match the result via ids in c++.

Btw, I thought more about it. And I agree, we should merge the audio and video usecases. (or databases)
And if it fits for more, even them. I guess pictures might be possible. (call it media db) Not so sure about tv and epg.
So we don't end up with ONE database, but we combine them where it is smart.

And also wrap them like services Wink
And I would still go for a person service unless we can be very sure that it will only be used by the media db.
I tried with mysql and it's actually worse..

The "A" scenario took 362 ms to get 493 movies without details on average.
The "B" scenario took 1489 ms to get 493 movies without details on average.

I didn't understand well the "C" scenario. If you want I can test it too.

I agree with you about database separation as service. It doesn't make any sense that queries are created outside the database code. Also all this database opening and closing should be avoided...
Reply
as for the C scenario - IMO there is no need to iterate over the movie list in first place only to gather the related IDs. Why not simply fetch the userdata for all movies in one go, then iterate over that and merge them to the movie list? Should save one iteration step or not?

Also, we likely don't need all that info right away, unless the list is sorted by it, so we could just as well lazy load it on first usage. In OO terms that would mean that the object holding the user related data will only be a proxy class that grabs the info from DB on first access of one of it's getter methods.

Another thing to think about is that we likely could prefetch userdata when we initialize the profile and store it in a memcache on platforms with enough memory. Just some random thoughts.
Reply
Just do not forget that data is not only GUI consumed but also Addons / JSON and other means that may need all data when they request it Smile

Also remember that userdata is potentially not only movies but episodes or songs, meaning thousands of values to fetch. (I have report of users with 150 000 songs or more in Kodi (of course not on rpi :p))
Reply
just to say something couse I have a large music libary like that (150k songs) and I have made my little own kodi web remote for it, and instead of using json I use sql directly, I dont use the views but I use joins where needed instead to get tracks, art etc, etc. and some ordering. Everything is lightning fast, with fulltext search I can even search for "love" in such a big libary and get over 1000 songs listed and ordered under 0.1s, the server isnt that fast either.

On the other hand I dont deal too much with the result (php/js) and I dont know what kodi does to it really.
Reply
(2015-12-03, 19:33)marantz Wrote: just to say something couse I have a large music libary like that (150k songs) and I have made my little own kodi web remote for it, and instead of using json I use sql directly, I dont use the views but I use joins where needed instead to get tracks, art etc, etc. and some ordering. Everything is lightning fast, with fulltext search I can even search for "love" in such a big libary and get over 1000 songs listed and ordered under 0.1s, the server isnt that fast either.

On the other hand I dont deal too much with the result (php/js) and I dont know what kodi does to it really.

Too much, the databases aren't our "problem" when it comes to runtime.
Reply
(2015-12-03, 01:00)Razze Wrote: Also a scenario "C" would be nice, fire query 1 like in senario "B" and concatenate the ids and do a "WHERE id in (YOUR-IDS-HERE). Then match the result via ids in c++.


I tried and in sqlite (on my pc at least) is actually faster than the view..
This time instead of simulating with the current database I actually split the filestate from the files to get a more realistic result. Here's the result:

SQLite
A method: 242 ms
B method: 310 ms
C method: 210 ms

MySQL
A method: 363 ms
B method: 2746 ms
C method: 380 ms

[quote='da-anda' pid='2177190' dateline='1449132482']
as for the C scenario - IMO there is no need to iterate over the movie list in first place only to gather the related IDs. Why not simply fetch the userdata for all movies in one go, then iterate over that and merge them to the movie list? Should save one iteration step or not?
You don't iterate in movie list only to get the related IDs but also to get the actual video info...

(2015-12-03, 10:48)da-anda Wrote: Also, we likely don't need all that info right away, unless the list is sorted by it, so we could just as well lazy load it on first usage. In OO terms that would mean that the object holding the user related data will only be a proxy class that grabs the info from DB on first access of one of it's getter methods.

This is a good point..But also makes the change even bigger..Starting from a db change this is a major library/database rework..
Reply
Team,
I think there are two points here:
1) the current DB structure (considered as a whole) have issues and the library management (that is super core to Kodi) does not work well, does create duplicates, does not allow to be centralized on a nas, etc...
2) which is the best way to implement a new schema?

Now for point 1) the new schema does provide quicker queries, better data, lesser duplication, etc. It creates the correct separation between the data groups while sharing core and common data

Regarding 2 on separate or joint databases that's a different story. The idea that having a single db will create lock downs is not completely correct as anyone can develop their own part without messing up the others.
For example we added all the games part after the initial design was done with no impact.
The fact is that there are common parts that cannot be separated and must be kept together.
For example the users and their visibility of objects is something cross the DB (or all the DBs).
The files are cross all the DBs
If you separate them you'll end up (again) with a mess where files are handled differently between games, audio, video, series...
You'll not be able to manage correctly the visibility of objects between the database or to do it you'll need a massive amount of code while on the DB is just a where clause

I've not studied all the KODI code, as is massive, but the first analysis I did before starding to redesign the DB had identified that there were very specific points where the queries are executed.
What will chage are only those points as the new structure is able to return (at a first implementation) exactly the same data.

After that it will be possible to enchance and take advantages of the new structure Smile

Just my 2c but as Kodi is advancing very well the drawbacks on the library management, lack of a centralized db etc.. are starting to became a real haeavy issue...
Also note that as the light witght devices (raspberry, low power HTPC, etc..) are continuously growing Kodi really needs to move out of the core all computational workload. All NAS I know do have an on board DB that can offload kodi from all of that and become much more responsive.
If you think I'm useful please use the +/- button to raise my reputation
Reply
(2015-12-01, 23:45)Razze Wrote: Another problem (at least for me) would be the database code file for that one database.
Our musicdatabase.cpp or videodatabase.cpp are already way longer than any file should be in a code base.

Razze,
If you want having one DB will allow you to create one DB class to handle all the queries and all db management thus removing all the db code from both files!

If you like you can use even MVVM model to have a library for the db access and one library to model the data. In this way all the other code will be abstracted from the db.

Also you'll avoid to have custom and different code to do the same stuff (select, update, etc...) in different parts of kodi like in games, video, music, etc...
This makes stuff very hard to manage and bugfix. While if you have 1 db access library it's much easier to maintain, debug and test.
If you think I'm useful please use the +/- button to raise my reputation
Reply
(2015-12-03, 03:15)phate89 Wrote:
(2015-12-03, 01:00)Razze Wrote: That test would be way more interesting in mysql, as I said somewhere above, I'm not sure how sqlite does it, but I think it may be closing/opening the database a bunch of times.
I really don't know, so that would be nice.
Also a scenario "C" would be nice, fire query 1 like in senario "B" and concatenate the ids and do a "WHERE id in (YOUR-IDS-HERE). Then match the result via ids in c++.

Btw, I thought more about it. And I agree, we should merge the audio and video usecases. (or databases)
And if it fits for more, even them. I guess pictures might be possible. (call it media db) Not so sure about tv and epg.
So we don't end up with ONE database, but we combine them where it is smart.

And also wrap them like services Wink
And I would still go for a person service unless we can be very sure that it will only be used by the media db.
I tried with mysql and it's actually worse..

The "A" scenario took 362 ms to get 493 movies without details on average.
The "B" scenario took 1489 ms to get 493 movies without details on average.

I didn't understand well the "C" scenario. If you want I can test it too.

I agree with you about database separation as service. It doesn't make any sense that queries are created outside the database code. Also all this database opening and closing should be avoided...

As I said originally I think we need to support both SQLite and MySQL because they have two different uses:
SQLite is ok for single, on Kodi machine, installation
MySQL is for NAS centralized DB

If you have a raspberry is much better to use the nas to issue the queries as is will be faster Smile also you'll not use the local disk to manage the db.
If you think I'm useful please use the +/- button to raise my reputation
Reply
  • 1
  • 15
  • 16
  • 17(current)
  • 18
  • 19
  • 20

Logout Mark Read Team Forum Stats Members Help
New DB Structure for all libraries + DBs [update 2015 06 13 / Released SQL 1.3]3