(2012-06-05, 13:11)Martijn Wrote: Imagine what would happen if it was constantly monitoring the source sites for new images.
They would buckly under the API load.
No no, you wouldn't constantly query (that'd be horrible) but you'd do something like adding a Refresh button to the art picker (that looks for new art), or perhaps even automatically refreshing the art list if the urls are no longer valid. This wasn't the main point though - I was speaking out against the crc32(image URL) idea. Keeping a record of the URL for quick lookup to say "okay we have that, no need to redownload it" is fine, but basing the
hash around the URL is not good. Better to do it by file content so that local images and identical images from multiple sites will all have the same local file.
The main reason? URLs change; static content hashes do not. Why base local filenames on an attribute that is in flux?
(2012-06-05, 13:11)Martijn Wrote: Local art is still preffered above remote(internet) when scraping (or perhaps the info update)
Yep. I was still only speaking about the hashing; basing local filename based on content hash is smarter than URL hash. URL hash doesnt even make sense when the actual source is a local jpg file.
The MyVideos database would be:
* Movie #1; image 8n8c3dhdd,jpg
* Movie #2; image iuuhduaduad.jpg
The Textures database would be:
* Texture 8n8c3dhdd,jpg, source:
http://somesite/bleh.jpg
* Texture iuuhduaduad.jpg, source: /storage/myart/something.jpg
This means local filenames are never going to change, because it's based on the actual content hash. Much more stable than basing it on something that can die or change (URLs). It also doesn't care what your source was; it can be a URL, an add-on, a local file, it doesn't care. It only cares about the unique data.
This approach would also allow you to easily query the "source" column for a particular URL to immediately map a URL to a local file without having to re-download it.
Now, why is this approach a better idea than the current? Well, imagine if the database had been hash/content-centric instead of URL-centric?
It would mean that we could ALWAYS SAFELY keep all old images and not have a care in the world about what the latest official fan art URLs are, because we don't care, we only care about the content hash and the fact that we have this file locally.
Had this been the case, nobody would lose art in the coming release of XBMC.
It's not too late to do it right now ;-) People will still lose art, but they'll only lose it now, and not next time URLs change.
In short: crc32(url) is very bad.
PS: I understand that you might have done the crc32(url) to be able to locally hash all the downloaded art previews and avoid re-downloading them, and to avoid having to store a filename <-> url mapping in any database (i.e hash(url) -> we know if we have the local version cached on disk). Sure, that's fine, but there are better approaches. For instance, you could finally get rid of the Thumbnails-folder data bloat by moving all temporarily downloaded previews to a special subfolder called "Thumbnails/temporary" or something, where you DO store crc32(url)-named images. Then, once an image is actually chosen, it gets properly file-content hashed and stored in the main "Thumbnails/" folder, where the purpose is to have a long-term, static, uniform filename.
Once art has been downloaded to the disk, the URL should be irrelevant; metadata fluff only useful to make sure we don't re-request it from the server to avoid straining the server for nothing.