Attention: Thumbnail cache rewrite's unintended consequences
#1
Bug 
Edit: Bug (well, rather, an unintended side effect of jmarshall's rewrite) has been found; read post 4.

Building XBMC master from about two months ago, the entire library of ~150 movies properly shows the fan art and thumbs.

Building XBMC from today or a week ago (I did not try further back) will have slow loading of fan art/thumbs and about 10% of movies or tv shows will not find their fan art anymore, despite the images existing on disk.

Going back to an older version works.

Seems something weird is going on with the hashing function or the treatment of paths now.

Perhaps some paths are being read in a different way now, causing them to hash differently?
Reply
#2
jmarshall has re-written the whole image caching functionality used by the video database (music database will follow). Therefore a complete re-scan of all video-related artwork is done when browsing video items for the first time.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#3
Update: I now tried going all the way back to 9ea952884163fd88a90cebd5b2b01ca930af689a
May 10th (just picked one at random) and it happens back then too
(2012-06-05, 11:40)Montellese Wrote: jmarshall has re-written the whole image caching functionality used by the video database (music database will follow). Therefore a complete re-scan of all video-related artwork is done when browsing video items for the first time.

I thought something like that might have been happening but I waited many minutes and the situation did not improve; the fan art and/or thumbs were still missing for certain movies and tv shows, despite existing on disk. I can't remember how long I gave it but I kept re-checking the same movie, which was the #1 movie (first row) in the sqlite database, and it just stayed blank.

Ideas? The videos are in the same paths, and the fan art etc exists on disk in the old hashed form in the proper userdata/Thumbnails location. I am waiting for it right now; it's definitely broken in some way... Wondering if he changed how the path is interpreted, leading to a different hash for some items.

The odd thing about that theory though is that every movie I have is under the exact same /storage/media/movies top level path, and 80-90% of the movies show their thumbnails/fanart properly.

Edit: 5 minutes later and it's still not loading. There is definitely some issue with it not reading the cached images on disk. But why? Ideas?
Reply
#4
Found the cause.

The new code re-reads the thumbnail/fan art from its ORIGINAL web URL instead of using the old cached version in the userdata/Thumbnails folder. This will cause DISASTER when users upgrade later, since loads of images disappear from the web all the time.

Example of a movie that does show all data properly:
NO debug log output

Example of a movie that does NOT show thumbnail and/or fan art:
11:52:25 T:140672996701952 DEBUG: GetImageHash - unable to stat url http://cf1.imgobject.com/backdrops/62e/4...iginal.jpg
11:52:25 T:140673005094656 DEBUG: GetImageHash - unable to stat url http://cf1.imgobject.com/posters/785/4d5...iginal.jpg

Now why on earth wouldn't it read the existing thumbnails and fan art on disk? Surely he didn't change image compression format?

Unless he's changed format, a patch is in order that says "if resource no longer exists online, look for and use the hashed local version instead, and if that fails too, give up".
Reply
#5
I had a look into the Textures13.db file and it looks to me like it's set up as follows:

* Videos in the MyVideos database have internet URL-based references to fan art (such as http://cf2.imgobject.com/t/p/original/bl...duhaud.jpg)
* The textures table of Textures.db maps internet URLs to local jpegs such as "http://cf2.imgobject.com/t/p/original/blablapadadawdjaidaduhaud.jpg -> d/dd08b5c9.jpg"
* That's nice as it eliminates redundancy (back in the old path-hashing method, you'd have multiple copies of an image if several episodes or movies used the same image.
* The local filename hash (such as dd08b5c9.jpg) might be based on the internet URL now rather than the video path, since images are now universal.

The issue is that it doesn't provision for the fact that loads of old fan art / thumbs no longer exists online. It needs some sort of importer for those, that grabs the old (invalid URL) and old locally-cached fan art, and does its magic and spits it into the Textures db as if a successful URL grab had taken place.

Otherwise people will lose big portions of their art next time they upgrade XBMC.
Reply
#6
No. We have no idea whether the old cached image is the image we think it is - the image could have come from anywhere - this is part of the reason for the changes, after all. Given this, we take a best guess, which most of the time works perfectly.

As you've identified, this doesn't work when the URLs that are stored in the database are not valid - this is already documented here on the forums. Note that these particular URLs have not been valid for some time - themoviedb moved servers quite some time ago. You would have discovered this had you wished to switch artwork for the items in question, thus this is nothing new - you just haven't hit it as you had no need to change images for those items.

There is no easy fix here, other than restoring the old hashing code and writing a bunch of stuff to copy that across - I suspect it would be messy. Regardless, should you care enough to investigate, I would be happy to review a patch to support this, and/or give pointers in the general direction required.

Otherwise, to remedy your issue, simply refresh the items in question.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#7
On reflection (it didn't take long) I won't actually accept any such patch - reason is it's just wallpapering over the hole that is the inaccuracy of the data in the database - it's likely you'll hit the problem sooner or later as long as the URLs are obviously wrong to begin with.

Thus, the only real solution here is rescanning. This, ofcourse, could be done quite effectively using the information we have in the db to match rather than relying on file lookups, but our scanner isn't designed to do it - a patch would be considered in this regard for sure - perhaps topfs2's GSoC changes might eventually bring something in this regard.

EDIT: And Martijn to the rescue. Given that we know the invalid URLs, we could perhaps add a prompt to get the script to run. Music will need a rescan anyway (from local info only), so it makes sense that a similar prompt for update could be handled in video when we detect one of the known bad URL formats perhaps?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#8
(2012-06-05, 12:33)jmarshall Wrote: No. We have no idea whether the old cached image is the image we think it is - the image could have come from anywhere - this is part of the reason for the changes, after all. Given this, we take a best guess, which most of the time works perfectly.

As you've identified, this doesn't work when the URLs that are stored in the database are not valid - this is already documented here on the forums. Note that these particular URLs have not been valid for some time - themoviedb moved servers quite some time ago. You would have discovered this had you wished to switch artwork for the items in question, thus this is nothing new - you just haven't hit it as you had no need to change images for those items.

There is no easy fix here, other than restoring the old hashing code and writing a bunch of stuff to copy that across - I suspect it would be messy. Regardless, should you care enough to investigate, I would be happy to review a patch to support this, and/or give pointers in the general direction required.

Otherwise, to remedy your issue, simply refresh the items in question.

Cheers,
Jonathan

I'm more concerned about users that upgrade later and all their reports of missing art. You're going to see it.

So why not do this:

if url 404
crc32 path
look for old thumbnail
store invalid url -> old thumbnail map in textures database

Wouldn't that be enough to solve it?

As for myself, it's not very difficult to re-add the art, and I'll do it. On that note: Since all art is re-grabbed from the web now, what can I do if I want to start fresh? Do I delete userdata/Thumbnails and all Database/Textures*.db files? Would that force a full refresh using the URLs stored in the MyVideos database, or would I lose my existing [valid] art choices by doing this?

I thought I'd get rid of the Thumbnails folder bloat with this opportunity.



Edit: I see you replied with more info. I don't see the reason to be afraid of an outdated URL? So what if the user has an old image from a now-invalid URL? All URLs are bound to go 404 some day.

I don't understand what the fear is? Why not bring those in as if they had been valid? The image file on disk from the previous cache is bound to be valid since it's the image the person chose and has been using for a long time.

I think the people that created their own fan art and installed it and didn't keep their original files will be the most annoyed, although they'll be happy to hear they can look through the hundreds or thousands of files in the old thumbnails folder to locate them again Wink
Reply
#9
I have made a library info update script that can remedy this by simply querying themoviedb.org and put in a new link.

Simple and effective
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#10
(2012-06-05, 12:49)Martijn Wrote: I have made a library info update script that can remedy this by simply querying themoviedb.org and put in a new link.

Simple and effective

Good work!

What does it do? Does it get the exact same artwork as last time or just a random piece?

If it gets the same art with a new URL, then that's brilliant.

Have you shared it anywhere yet?
Reply
#11
(2012-06-05, 12:52)john.doe Wrote:
(2012-06-05, 12:49)Martijn Wrote: I have made a library info update script that can remedy this by simply querying themoviedb.org and put in a new link.

Simple and effective

What does it do? Does it get the exact same artwork as last time or just a random piece?

It can check the latest data from themoviedb.org with the poster they use by default (this is done by highest rated).

So no gurantee that will have the same one.
I can have an option it will first scan for local files and if missing then check internet.
If Jonathan agrees ofcourse. Still needs some work. atm it just updates votes/ratings
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#12
Ahh I understand now: The list of remote thumbs is statically stored at grab-time and never updated unless the user refreshes the library item, so having outdated URLs (as I do) means you cannot grab different thumbs or fan art.

That shows the bad design of storing a static list of image URLs. Imagine having to refresh all your videos and re-select all art every time the site changes URL structure? Oh wait, that's how it is implemented. ;-)

Sarcasm aside, why not change to a model where the art is stored on disk named after the hash of the JPG data itself, and do away with the URL <-> local data tie. Let art URLs be flexible and able to change. The file content hashing takes care of only storing the file data once. No need to involve URLs.

Why am I suggesting this? Because URLs:
* Die completely
* Change drastically
* Have multiple sources which might actually be storing the same data, still leading to redundancy on disk

And the URL-centric approach does not take into account user-made fan-art that is installed via browsing for local files.

It's like the crc32(/path/to/movie.avi) approach was the hobo on the street, the crc32(url)-centric approach is the businessman, and the content hash-centric approach is... wait where am I going with this analogy?
Reply
#13
(2012-06-05, 13:04)john.doe Wrote: That shows the bad design of storing a static list of image URLs. Imagine having to refresh all your videos and re-select all art every time the site changes URL structure? Oh wait, that's how it is implemented. ;-)
Imagine what would happen if it was constantly monitoring the source sites for new images.
They would buckly under the API load.

Quote:Sarcasm aside, why not change to a model where the art is stored on disk named after the hash of the JPG data itself, and do away with the URL <-> local data tie. Let art URLs be flexible and able to change. The file content hashing takes care of only storing the file data once. No need to involve URLs.

Why am I suggesting this? Because URLs:
* Die completely
* Change drastically
* Have multiple sources which might actually be storing the same data, still leading to redundancy on disk

And the URL-centric approach does not take into account user-made fan-art that is installed via browsing for local files.

It's like the crc32(/path/to/movie.avi) approach was the hobo on the street, the crc32(url)-centric approach is the businessman, and the content hash-centric approach is... wait where am I going with this analogy?

Local art is still preffered above remote(internet) when scraping (or perhaps the info update)
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#14
(2012-06-05, 13:11)Martijn Wrote: Imagine what would happen if it was constantly monitoring the source sites for new images.
They would buckly under the API load.

No no, you wouldn't constantly query (that'd be horrible) but you'd do something like adding a Refresh button to the art picker (that looks for new art), or perhaps even automatically refreshing the art list if the urls are no longer valid. This wasn't the main point though - I was speaking out against the crc32(image URL) idea. Keeping a record of the URL for quick lookup to say "okay we have that, no need to redownload it" is fine, but basing the hash around the URL is not good. Better to do it by file content so that local images and identical images from multiple sites will all have the same local file.

The main reason? URLs change; static content hashes do not. Why base local filenames on an attribute that is in flux?

(2012-06-05, 13:11)Martijn Wrote: Local art is still preffered above remote(internet) when scraping (or perhaps the info update)

Yep. I was still only speaking about the hashing; basing local filename based on content hash is smarter than URL hash. URL hash doesnt even make sense when the actual source is a local jpg file.

The MyVideos database would be:
* Movie #1; image 8n8c3dhdd,jpg
* Movie #2; image iuuhduaduad.jpg

The Textures database would be:
* Texture 8n8c3dhdd,jpg, source: http://somesite/bleh.jpg
* Texture iuuhduaduad.jpg, source: /storage/myart/something.jpg

This means local filenames are never going to change, because it's based on the actual content hash. Much more stable than basing it on something that can die or change (URLs). It also doesn't care what your source was; it can be a URL, an add-on, a local file, it doesn't care. It only cares about the unique data.

This approach would also allow you to easily query the "source" column for a particular URL to immediately map a URL to a local file without having to re-download it.

Now, why is this approach a better idea than the current? Well, imagine if the database had been hash/content-centric instead of URL-centric?

It would mean that we could ALWAYS SAFELY keep all old images and not have a care in the world about what the latest official fan art URLs are, because we don't care, we only care about the content hash and the fact that we have this file locally.

Had this been the case, nobody would lose art in the coming release of XBMC.

It's not too late to do it right now ;-) People will still lose art, but they'll only lose it now, and not next time URLs change.

In short: crc32(url) is very bad.

PS: I understand that you might have done the crc32(url) to be able to locally hash all the downloaded art previews and avoid re-downloading them, and to avoid having to store a filename <-> url mapping in any database (i.e hash(url) -> we know if we have the local version cached on disk). Sure, that's fine, but there are better approaches. For instance, you could finally get rid of the Thumbnails-folder data bloat by moving all temporarily downloaded previews to a special subfolder called "Thumbnails/temporary" or something, where you DO store crc32(url)-named images. Then, once an image is actually chosen, it gets properly file-content hashed and stored in the main "Thumbnails/" folder, where the purpose is to have a long-term, static, uniform filename.

Once art has been downloaded to the disk, the URL should be irrelevant; metadata fluff only useful to make sure we don't re-request it from the server to avoid straining the server for nothing.
Reply
#15
@Martijn
For whatever it's worth, with your script, I like the idea of having it user selectable when doing an update/refresh to use local artwork.
Reply

Logout Mark Read Team Forum Stats Members Help
Attention: Thumbnail cache rewrite's unintended consequences1