• 1
  • 19
  • 20
  • 21
  • 22
  • 23(current)
Release TheAudioDb.com Music Video Scraper
(2020-12-14, 01:47)alfasud Wrote: I'm mostly getting good results when I spell things *exactly* as they are on TADB but there are some strange anomolies.  Like this one:

I've set the file name to:
Enur - Calabria 2008.vob

And it should match:
Track Name
ImageImageEnur
Image Raggatronic
Image Calabria 2008

But it doesn't.  I've checked the spelling >10 times thinking I've made a mistake but I haven't.  Huh
I just checked and it looks fine from a TADB point of view. A debug log would be useful showing the api lookup URL.
Reply
It has issues with a few items.

Anything with numbers is iffy. I also have that song and it hates it. It scans Britney Spears - 3 just fine, but anything that feels like a year is weird. For Calabria and similar songs I had to add the album in brackets, that helps with the confusion sometimes. So Enur - Calabria 2008 [Raggatronic].mp4 scrapes. I suspect that because an album is called "Calabria 2007" breaks it?

Other songs refuse to scrape no matter what I do. For example I have given up on Gorillaz - 19-2000. It just won't. Website nails it but the scraper won't. I tried adding the album Gorillaz - 19-2000 [Gorillaz] and still nothing. Some just won't.

Others I caught why they don't work.

Some are file system limitations, for example "Weird Al" Yancovic can't be scraped on Windows (normal) because Windows disallows quotes and a few others contain question marks and such. But some are semi-file system limitations. For example "Orbital - Halcyon + On + On" scrapes if you type that, exactly, however, it won't auto-scrape but "C+C Music Factory" DOES scrape. I assumed that C + C simply has an alias "C ? C music Factory" to help with finding it. Don't get me started on *NSYNC.

Another limitation that I noticed is that inserting a C style \ never helps. When prompted to enter text to scrape the weird characters get converted to escape characters but when escaping it myself it doesn't. \" never inserts the quote. I don't think HTML codes work either. I haven't looked up Unicode yet to see if it helps.

A lot of this would be mitigated if the manual scrape would return a few results to choose from, like some video scrapers do, if that is at all possible. I noticed the website seems to match substrings well. So Gorillaz - 19-2000 split as artist-album should pop the right song just fine. It should also help with the (original mix) plague on scraping.

Oh, another thing I noticed, unrelated to this scraper, is that some people use weird characters in the filename. Sometimes an m-dash (that is easy to spot) but sometimes they use a different but almost identical character to a dash. So renaming the file and manually adding "[space][dash][space] just scrapes. Another thing I noticed is that sometimes a restart of Kodi scrapes that one stubborn one.
Reply
Clarification: "Orbital - Halcyon + On + On" won't scrape from filename, will if typed, however when the prompt to enter manual search string appears, it will be Halcyon[space][space?][space]On ... so the 3 characters are there, but the plus gets converted into a space or a nondisplayable character. This might be relvant for C+C example.

Addendum: The reason I mentioned the format thing is that maybe there would be a good idea of the scraper prompted you with [badly formatted string] instead of the filename if it finds the name to not be [string][space][dash][space][string]. Could be a valuable clue.
Reply
Excuse me while I talk to myself a bit.

Tests:

[Y] Sophie Ellis-Bextor - Murder On The Dancefloor
Does scrape. This means that dashes in the artist are not the issue.

[Y] Sophie Ellis-Bextor - Get Over You
Does scrape.

[N] KRS-one - Sound of Da Police
Does not scrape. Dash is fine.
Note: there is a SONG called KRS-One by Sublime. [1]

[N] Gorillaz - 19-2000
Dash in the name.
Name contains token that could be a year
Album is called Gorillaz
Maybe it makes it think that it is an album? Gorillaz (2000) could be valid album request.


[Y] T-Spoon - S*x on the beach
Scrapes fine (I added the asterisk, I didn't read the rules)
Dash in the name, has capitalized name, capitalized tokens left and right, has a single letter before dash. All fine.

[N] The Smashing Pumpkins - 1979
Does not scrape.
There is an artist called 1979
There are 5 albums called 1979

[N] Three Drives - Greece 2000 (radio edit) [Greece 2000]
Will not scrape. Not the title, not the correct, verbatim title, not even the full Artist - Title [Album]
The track and album have the same name.

[N] Will Smith - Men In Black
Will not scrape.
Album is the same name as the track (Will Smith - Men In Black [Men in Black]). Maybe it sees it as an album scrape?
There are multiple tracks called Men In Black

Neither will "Gettin' Jiggy Wit It" - same issue, there is an album called "Gettin' Jiggy Wit It"
Same goes for Enur - Calabria 2008. There's a track called Calabria 2007 and the album is also Calabria 2007.
Reply
Most likely the dashes are weird characters from musicbrainz that are not normal dash. I can take a look soon and maybe fix manually.
Reply
I tried copy/paste from TheAudioDB and it made no difference, whereas it did for "Weird al" Yakovic because they use all kinds of quotes for him. After copy pasting, it came back as "found" using escapes: \"Weird Al\" Yankovic.

This didn't happen for KRS-One nor for Gorillaz. And I still have no idea about Will Smith.

So I grabbed the API by the ... vast documentation they provide. Oo

What endpoint(s) are you using?

I tried mvid.php?i=111393 (Gorillaz) and it lists tracks, including 19-2000.
I also tried 114276 (Will Smith) and the track lists just fine ("Men in Black") and I looked at the hex dump of the json reply and I can't find any special characters.
The Smashing Pumpkins (111999) scrapes and 1979 is in the list.
Enur (119767) lists strTrack "Calabria 2008"

However

123993 (KRS-One) applied to the mvid endpoint brings me a valid but empty JSON.
mvid:null

So, not a scraper bug. Not even a dash thing, it just hates 123993.
(ETA: also hates 119121 artist C-Block - I checked the dash and it's 2Dh - aka ascii dash.)

ETA:
Small update
Bomb Da Bass (111664) lists a song called "Beat Dis (U.S. 7" Mix)" - nothing I do seems to make it scan. I assumed it was a quote parse thing but not even «Beat Dis (U.S. 7» works.
Reply
  • 1
  • 19
  • 20
  • 21
  • 22
  • 23(current)

Logout Mark Read Team Forum Stats Members Help
TheAudioDb.com Music Video Scraper4