Improving movie db update performance.
#16
(2014-02-18, 02:26)cg110 Wrote: While I'm comfortable that the code does work for the simple cases it's not fully tested, so be aware of that before pushing it into your elec/rpi build.

Certainly I'm wondering if the list manipulations will work correctly (eg writers) as the transaction starts by deleting everything, and then adds the missing elements.

Yes, hopefully I've made it clear that it's not for everyone and taking a backup before testing would be a good idea. Additional testing might help reveal a few more "gotchas" that can be nailed early on before this gets wider exposure.

(2014-02-18, 02:26)cg110 Wrote: Also your post made me spot a missing short cut. If the list of updatedDetails is empty then don't transact nothing to the sql db. This is possible in the case that it's a refresh that doesn't change anything. You did hint in your testing, that you had to force a change in the rating/votes count for it to do anything.

Anyway I've done an update that makes the Update code just return if there ends up being an empty updatedDetails set.

Yes, I saw it performing a start/commit transaction even if nothing is being updated, didn't think it was worth mentioning but glad you spotted it. Your update made it into the test build and is working as expected.

I've tested setting/clearing artwork and that seems to be working fine:

Setting a clearart artwork on a movie with three existing artwork types (logo, fanart and poster):
Code:
01:06:47 2487.320801 T:2836395088   DEBUG: JSONRPC: Incoming request: {"jsonrpc": "2.0", "params": {"art": {"clearart": "http://assets.fanart.tv/fanart/movies/19908/movieart/zombieland-4fd8c70fd673c.png"}, "movieid": 744}, "method": "VideoLibrary.SetMovieDetails", "id": "libSetDetails"}
01:06:47 2487.439697 T:2836395088   DEBUG: Mysql Start transaction
01:06:47 2487.444092 T:2836395088   DEBUG: Mysql execute: INSERT INTO art(media_id, media_type, type, url) VALUES (744, 'movie', 'clearart', 'http://assets.fanart.tv/fanart/movies/19908/movieart/zombieland-4fd8c70fd673c.png')
01:06:47 2487.483643 T:2836395088   DEBUG: Mysql execute: UPDATE art SET url='http://assets.fanart.tv/fanart/movies/19908/hdmovielogo/zombieland-5145e97ed73a4.png' where art_id=25818
01:06:47 2487.975586 T:2836395088   DEBUG: Mysql execute: UPDATE art SET url='nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Zombieland (2009)[BDRip]-fanart.jpg' where art_id=25807
01:06:47 2487.997070 T:2836395088   DEBUG: Mysql execute: UPDATE art SET url='nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Zombieland (2009)[BDRip]-poster.jpg' where art_id=25808
01:06:47 2488.027344 T:2836395088   DEBUG: Mysql commit transaction
01:06:47 2488.036621 T:2836395088    INFO: JSONRPC Server: Disconnection detected

However there is an oddity when clearing/removing an artwork item - the delete doesn't appear to be in the same transaction as the updates:
Code:
01:03:33 2293.269775 T:2836395088   DEBUG: JSONRPC: Incoming request: {"jsonrpc": "2.0", "params": {"art": {"clearart": null}, "movieid": 744}, "method": "VideoLibrary.SetMovieDetails", "id": "libSetDetails"}
01:03:33 2293.385742 T:2836395088   DEBUG: Mysql Start transaction
01:03:33 2293.392334 T:2836395088   DEBUG: Mysql execute: UPDATE art SET url='http://assets.fanart.tv/fanart/movies/19908/hdmovielogo/zombieland-5145e97ed73a4.png' where art_id=25818
01:03:33 2293.957520 T:2836395088   DEBUG: Mysql execute: UPDATE art SET url='nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Zombieland (2009)[BDRip]-fanart.jpg' where art_id=25807
01:03:33 2294.042725 T:2836395088   DEBUG: Mysql execute: UPDATE art SET url='nfs://192.168.0.3/mnt/share/media/Video/MoviesHD/Zombieland (2009)[BDRip]-poster.jpg' where art_id=25808
01:03:34 2294.231934 T:2836395088   DEBUG: Mysql commit transaction
01:03:34 2294.236328 T:2836395088   DEBUG: Mysql execute: DELETE FROM art WHERE media_id=744 AND media_type='movie' AND type='clearart'
01:03:34 2294.501709 T:2836395088    INFO: JSONRPC Server: Disconnection detected

Given the above, I must ask if it is necessary to update all the other list items that are not being referenced in the JSON call. I haven't looked at the code, but maybe the list processing isn't so easy to optimise and if that's the case fortunately the overhead doesn't appear to be particularly significant (although that could change if there were maybe a hundred or so list items, ie. cast members).
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#17
What you're seeing is an existing quirk of how artwork updates are done, which I kept it in place while working on the other code.

I'm not sure why but the existing db layer call setmoviedetails doesn't appear to remove artwork, so the caller (in this case the JSON layer) does the removal. By coping that spilt I've ended up just knowing that art has changed. And the removal still happens in the JSON layer (hence the separate transaction)

The new code probably could split that work in two, and have separate update types for "art.updated" and "art.removed" which would reduce the work. I could then pull the removal of artwork into the db layer.

Even with this inefficiency you should still see the gain of not having to update everything else for removing one piece of artwork, but you're right it could be more optimal.

Making finer grained list updates is likely to be trickier, as trying to get the current list and work out the set of what's new, what's unchanged and what's been deleted is harder, hence most of the list updates tend to delete all the items and re add them.

This doesn't mean I won't look at doing it, but it may start to push the boundaries of what can be gained in performance (IE do we burn more CPU/time working out what changed in xbmc vs letting the SQL server do the work, of course a cast member update might be a case where it could be a winner)
Reply
#18
Regarding the separate transaction, as I say it's just an oddity - I'm aware of the changes made to remove artwork (added only recently) and maybe it's not been done entirely in harmony with existing db work (most likely out of necessity, or to minimise impact) but probably not something you need to fix here in order to minimise the risk with your own changes, as the gain is likely negligible.

As for the lists, I'm just making observations and it's good to know they could be done more efficiently. However the big win with your changes is with the simple fields, so leaving list optimisation for a second bite of this cherry would be the best approach rather than over complicate this first attempt.
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#19
(2014-02-18, 02:26)cg110 Wrote: I expect it to work, but the "old code" transacted the deletes, then did the updates as another transaction, I'm trying to do it all in one. My experience of SQL is that just cos it should work doesn't mean it does Smile

cg110, if this was a production system then I'd say your approach is better. Logically from the perspective of a database reader, you want to to retain consistency in the database so the deletes and inserts are better off being grouped into one transaction. From a dbengine perspective reducing the number of transactions is also a good thing, consider a scenario where you're updating 30k rows with a commit after every row. The db has overhead associated with a commit operation which slows things down. Instead batching them up into chunks reduces the associated overhead and improves overall performance.

The worst case scenario of the "old" approach is that an error occurs after the deletes are committed but before the Inserts are, thereby leaving your data in an inconsistent state.

Disclaimer: The above are the ramblings of an old time C/C++ developer who now helps support commercial systems based on SQL Server and Oracle.
Reply
#20
So I've done a few more adjustments.

Artwork is no longer updated if it's the same (Note that it might end up with an empty transaction).

The playcount/last played times aren't changed in the videolibrary update code, as it doesn't make sense to set the file attributes to another file for the same movie during an update, if it's a new file, it probably does.

Certainly there are more things that could be made efficient, but I'm wondering if this is into diminishing returns.

I've also sprinkled more log messages in, mainly so I could see where time was being spent. I expect to remove them before I'd make a push request.

Note one change that may not work on linux is one to log messages to include ms...

In my debug build if you set 1 item of artwork to the same as it was it now takes ~60-70ms, ~40-50ms of which are before it even calls into the videolibrary update code, so something in the layers of JSON processing.
Reply
#21
(2014-02-22, 20:51)cg110 Wrote: Note one change that may not work on linux is one to log messages to include ms...

Correct, the milliseconds patch doesn't apply on Linux, but the Linux log includes milliseconds by default anyway (might even be nanoseconds on some systems).

If I don't apply the "Add milliseconds to logged time." patch, the build still fails on Linux (OpenELEC) with the following error: pastebin.
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#22
(2014-02-22, 22:39)MilhouseVH Wrote: the build still fails on Linux (OpenELEC) with the following error: pastebin.

Fixed. Seems that window's isn't as picky about some of these things (or it's been extended to support passing objects to params)

I might have to dust off one of my old boxes and put linux on it, so I can catch this kind of thing.
Reply
#23
Thanks, this seems to be working really well!

Here's a log updating the artwork (same artwork each time) on the Harry Potter movies, with before/after timings: pastebin.

The improvement is pretty huge, going from 15 seconds (worst case) to under 0.15 seconds in most cases. I've also confirmed that setting new values on artwork also works well, incurring only slightly more overhead (a few extra milliseconds per item) due to the actual SQL updates that are now taking place.

I've published an updated OpenELEC/R-Pi build here for anyone that is interested in testing.
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#24
might i suggest start making a separate branch of xbmc/master because you are now working directly in the master branch.
this will create issues when do a pull-request on github.
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#25
Glad to see it's down to around 110ms. That should be quite a saving for 100 items (at least 500s to around 11s) Should move the delay into the addon talking to the web servers rather than xbmc.

I suspect the next place to examine for efficiency before the update starts, in your case it appears to be 100ms of the 110ms...

I still need to pull the deletes into the transaction (just for the sake of consistency)

Might also be worth making sure the delete is needed, as the code always runs the delete.

I actually suspect that the art should really be done as a group, read the current state once (rather than each art item) and then process the adds, updates and deletes as needed. (Or come up with a generic thing that can cope with the various different list types)

Certainly the current code is faster for doing updates. I just need to come up with something to test all the corners. (I'm fairly comfortable it does work though, just hard to test every corner case manually, my original plan was a tool that would update the movie sets without having to refresh each item (or manually do it) )
Reply
#26
Yeah JSON serialization and deserialization is pretty slow. Not sure if its yajl or if most JSON libraries have the same timings. And the bigger the data the slower it is obviously.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#27
I've actually done some more digging into where the time was being burned. It actually isn't the JSON layer, it's opening the DB connection, and fetching the object details, which turns out to be fetching the object details that are painful.

So I've done some initial work on improving the fetching of the movie details. Specifically, if the JSON layer doesn't ask for cast don't fetch it (as it won't return it anyway)

I've also removed an O(n^2) algorithm in the get cast code, so it uses a set to lookup if an actor is already in the results vector. (it was looping over the vector)

Note that I've also compiled the current head on ubuntu 12.04 and it does compile.

I suspect more fine grained control over what details should be fetched could be useful (as not everything is needed all the time)
Reply
#28
When retrieving all movies or a specific movie through JSON-RPC there already is the necessary logic to only fetch additional (expensive) details like cast or tags etc.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#29
There is, but it's either all the extra stuff, or none of it.

From what I could see when timing this, the cast isn't cheap to query, and also isn't cheap to convert to an object (as the thumbnail url thing is an XML blob that needs parsing (once it's been determined if it's utf-8))

Doing some basic timings, I've got aeon nox running, and it uses Skin Widgets:
http://wiki.xbmc.org/index.php?title=Add...in_Widgets

Skin widgets use the json API, and doesn't ask for cast, it does a bunch of queries at start up to the various dbs, however the 3 that are related to movies are like this:
{"jsonrpc": "2.0", "id": 1, "method": "VideoLibrary.GetMovies", "params": {"properties": ["title", "originaltitle", "playcount", "year", "genre", "studio", "country", "tagline", "plot", "runtime", "file", "plotoutline", "lastplayed", "trailer", "rating", "resume", "art", "streamdetails", "mpaa", "director"], "limits": {"end": 20},

with different sorts and filters:
"sort": {"method": "random" }, "filter": {"field": "playcount", "operator": "lessthan", "value": "1"}}}
"sort": {"order": "descending", "method": "lastplayed"}, "filter": {"field": "inprogress", "operator": "true", "value": ""}}}
"sort": {"order": "descending", "method": "dateadded"}, "filter": {"field": "playcount", "operator": "is", "value": "0"}}}

I did 3 runs, with win32 Release (DirectX), with and without fetching the cast, and it puts out the following times:
Fetching Cast:
DEBUG: Skin Widgets: Total time needed for all queries: 0:00:12.797000
DEBUG: Skin Widgets: Total time needed for all queries: 0:00:14.643000
DEBUG: Skin Widgets: Total time needed for all queries: 0:00:13.145000

Not fetching cast:
DEBUG: Skin Widgets: Total time needed for all queries: 0:00:10.885000
DEBUG: Skin Widgets: Total time needed for all queries: 0:00:10.607000
DEBUG: Skin Widgets: Total time needed for all queries: 0:00:08.887000

So some improvement with the change only to the movies. The tv shows/episodes could have the same done, and may see further savings.

(Hardware is an i7-920 with a synology 1511+ as the mysql server, networked into a 1Gbit switch)
Reply
#30
It would be great if these changes could be split and wrapped up into separate PRs.

Regarding the efficiency of cast processing, it seems that when cast is not present on the JSON request there are no cast-related SQL queries - see the following logs showing JSON request and MySQL query logs: without cast, with cast. The test system did not have the three most recent cast-related commits.

It doesn't appear as though any cast optimisation is necessary, at least not as far as SQL processing is concerned.
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply

Logout Mark Read Team Forum Stats Members Help
Improving movie db update performance.0