[Release] TheAudioDb.com Music Video Scraper

  Thread Rating:
  • 3 Votes - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
olympia Offline
Team-Kodi Member
Posts: 2,499
Joined: May 2008
Reputation: 32
Post: #1
[Image: mvids.png]

Music Video Scraper

Source: http://www.theaudiodb.com ([Image: mvid.png] links)

Status: Working

Notes: The scraper ONLY supports music video files with the naming convention: 'artist - track'.
-> Note the dash and the space between and after!!!
As being said it will not work perfectly, but I do hope it does the most...

See more about theaudiodb.com in this thread: http://forum.xbmc.org/showthread.php?tid=134260

Adding Music Videos
Sign up and log in on http://www.theaudiodb.com site and search for the track. Or navigate through artist name >> album >> Track.
Then edit the track details and enter a youtube link (preferably vevo) into the correct text box.
Also add some music video screenshots if you like.
(This post was last modified: 2013-03-27 13:15 by zag.)
find quote
Domin Offline
Junior Member
Posts: 10
Joined: Nov 2010
Reputation: 0
Location: Denmark
Post: #2
I get the following error trying to scrape for videos:

Code:
11:51:36 T:139888797792064   DEBUG: ------ Window Init (DialogVideoScan.xml) ------
11:51:36 T:139888797792064    INFO: Loading skin file: DialogVideoScan.xml
11:51:36 T:139888057890560  NOTICE: Thread CVideoInfoScanner start, auto delete: false
11:51:36 T:139888057890560  NOTICE: VideoInfoScanner: Starting scan ..
11:51:36 T:139888797792064   DEBUG: LIRC: Update - NEW at 6250789:160 0 KEY_OK_UP devinput (KEY_OK_UP)
11:51:56 T:139888797792064   DEBUG: SECTION:UnloadDelayed(DLL: special://xbmcbin/system/ImageLib-x86_64-linux.so)
11:51:56 T:139888797792064   DEBUG: Unloading: ImageLib-x86_64-linux.so
11:51:59 T:139888440289024   DEBUG: Thread Jobworker 139888440289024 terminating (autodelete)
11:52:14 T:139888057890560   DEBUG: VideoInfoScanner: Scanning dir '/storage/MVID/MVID1/' as not in the database
11:52:18 T:139888057890560   DEBUG: VideoInfoScanner: No (new) information was found in dir /storage/MVID/MVID1/
11:52:18 T:139888057890560   DEBUG: VideoInfoScanner: Scanning dir '/storage/MVID/MVID1/(sensation_2003_black_edition)-megamix_2003_black_edition_(dvdrip_svcd_2003)__kazan-mv/' as not in the database
11:52:18 T:139888057890560   DEBUG: ExcludeFileOrFolder: File '/storage/MVID/MVID1/(sensation_2003_black_edition)-megamix_2003_black_edition_(dvdrip_svcd_2003)__kazan-mv/Sample/' excluded. (Matches exclude rule RegExp:'[!-._ \\/]sample[-._ \\/]')
11:52:18 T:139888057890560   DEBUG: VideoInfoScanner: No (new) information was found in dir /storage/MVID/MVID1/(sensation_2003_black_edition)-megamix_2003_black_edition_(dvdrip_svcd_2003)__kazan-mv/
11:52:18 T:139888057890560   DEBUG: VideoInfoScanner: Scanning dir 'rar://%2fstorage%2fMVID%2fMVID1%2f%28sensation%5f2003%5fblack%5fedition%29%2dmegamix%5​f2003%5fblack%5fedition%5f%28dvdrip%5fsvcd%5f2003%29%5f%5fkazan%2dmv%2f%28sensat​ion%5f2003%5fblack%5fedition%29%2dmegamix%5f2003%5fblack%5fedition%5f%28dvdrip%5​fsvcd%5f2003%29%5f%5fkazan%2dmv%2erar/' as not in the database
11:52:18 T:139888057890560   ERROR: Parse: Could not find scraper function NfoUrl
11:52:18 T:139888057890560   DEBUG: FindMovie: Searching for '(sensation 2003 black edition) megamix 2003 black edition' using TheAudioDb.com for Music Videos scraper (path: '/storage/.xbmc/addons/metadata.musicvideos.theaudiodb.com', content: 'musicvideos', version: '1.0.0')
11:52:18 T:139888057890560   ERROR: Run: Unable to parse web site
11:52:18 T:139888057890560    INFO: Loading skin file: DialogYesNo.xml
11:52:18 T:139888797792064   DEBUG: ------ Window Init (DialogYesNo.xml) ------

The scanning then stops and says the server is unavailable and i can then press yes to continue scanning, if i press yes it will do the same again, and if i press no it will just quit.

Hope this can be fixed, and thanks for working on an mvid plugin ;-)

Regards
Domin

Regards
Domin

ASRock ION 330HT / Raspberry PI
(This post was last modified: 2012-08-04 12:30 by Domin.)
find quote
RiotGrrl Offline
Junior Member
Posts: 4
Joined: Sep 2012
Reputation: 0
Post: #3
The scraper doesn't match any songs with a hyphen in.
eg. Placebo - Infra-red, Gorillaz - 19-2000, Nirvana - Heart-shaped Box, etc.

I've tried just about everything - omitting the hyphen, a double hyphen "--", a space, underscore, etc - but nothing works. Is there a wildcard character that i could use to search?

Would appreciate any help, I can't find any threads that solve this.
find quote
clackerdacker Offline
Donor
Posts: 114
Joined: Jul 2008
Reputation: 0
Location: Sydney, Australia
Post: #4
(2012-08-04 12:14)Domin Wrote:  I get the following error trying to scrape for videos:

Code:
11:52:18 T:139888057890560   ERROR: Run: Unable to parse web site
11:52:18 T:139888057890560    INFO: Loading skin file: DialogYesNo.xml
11:52:18 T:139888797792064   DEBUG: ------ Window Init (DialogYesNo.xml) ------

The scanning then stops and says the server is unavailable and i can then press yes to continue scanning, if i press yes it will do the same again, and if i press no it will just quit.

Hope this can be fixed, and thanks for working on an mvid plugin ;-)

Regards
Domin

Same here. Any clues?
find quote
zag Offline
Team-Kodi Member
Posts: 1,681
Joined: Oct 2007
Reputation: 20
Location: UK
Post: #5
Quote:Correctly matched 1 of 24 music videos to its artist. Doesn't like dashes, searching "B.o.B." results in an "unable to connect to remote server" error.

Olympia could you take a look at these issues maybe? The dash thing is a big problem as it effects lots of searches.

Code:
http://www.theaudiodb.com/api/v1/json/1/searchtrack.php?s=placebo&t=Infra-red

Works so it must be scraper problem.

The issue with B.o.B is that there is an extra fullstop at the end of the users search string. It shouldn't cause a server error though.

Code:
http://www.theaudiodb.com/api/v1/json/1/searchtrack.php?s=b.o.b.&t=arena

HTPC - XBMC Gotham, OpenELEC, Harmony Smart Remote, Intel Haswell NUC, 40gb intel SSD, Core i3, 4gb RAM
Storage - 2 x qnap 8tb 419p+ NAS
Display LG 46" LCD + Casio Bulbless projector [PICS]
[Image: widget]
(This post was last modified: 2013-01-16 18:58 by zag.)
find quote
skypichat Offline
Junior Member
Posts: 29
Joined: Dec 2008
Reputation: 0
Location: France
Post: #6
Hello,
Possible to read ID3 tag from MP4 files ?
The scrapper disconnect if the name contain "__ without "-"

Meedios Media Center
find quote
zag Offline
Team-Kodi Member
Posts: 1,681
Joined: Oct 2007
Reputation: 20
Location: UK
Post: #7
(2013-01-20 13:16)skypichat Wrote:  Hello,
Possible to read ID3 tag from MP4 files ?
The scrapper disconnect if the name contain "__ without "-"

This is a metadata scraper, it doesn't read ID3 tags.

Try checking the database with sqlite to see how its been added to xbmc

HTPC - XBMC Gotham, OpenELEC, Harmony Smart Remote, Intel Haswell NUC, 40gb intel SSD, Core i3, 4gb RAM
Storage - 2 x qnap 8tb 419p+ NAS
Display LG 46" LCD + Casio Bulbless projector [PICS]
[Image: widget]
find quote
n1md4 Offline
Fan
Posts: 408
Joined: Nov 2012
Reputation: 8
Post: #8
Hi. I've added this today, and get the same error:

Code:
13:07:53 T:774605632  NOTICE: Previous line repeats 2 times.
13:07:53 T:774605632  NOTICE: Thread CVideoInfoScanner start, auto delete: false
13:07:53 T:774605632  NOTICE: VideoInfoScanner: Starting scan ..
13:07:53 T:774605632   ERROR: Run: Unable to parse web site
13:07:55 T:774605632 WARNING: No information found for item '/home/xbmc/videos/music/Charlene Soraia/Ghost.mkv', it won't be added to the library.
13:07:55 T:774605632   ERROR: Run: Unable to parse web site
13:07:57 T:774605632 WARNING: No information found for item '/home/xbmc/videos/music/Linkin Park/Meteora.mkv', it won't be added to the library.
13:07:57 T:774605632   ERROR: Run: Unable to parse web site
13:07:59 T:774605632 WARNING: No information found for item '/home/xbmc/videos/music/The Chemical Brothers/Block Rockin Beats.mkv', it won't be added to the library.
13:07:59 T:774605632  NOTICE: VideoInfoScanner: Finished scan. Scanning for video info took 00:05

HTPC XBMC Gotham OpenELEC, NYXBoard, OCZ SSD
Storage 4x 2TB Green HDD BTRFS RAID1
Display Sony Bravia 32"
find quote
olympia Offline
Team-Kodi Member
Posts: 2,499
Joined: May 2008
Reputation: 32
Post: #9
you need ' - ' between the artist and the track.
find quote
n1md4 Offline
Fan
Posts: 408
Joined: Nov 2012
Reputation: 8
Post: #10
Wizard! You're right, working now.

HTPC XBMC Gotham OpenELEC, NYXBoard, OCZ SSD
Storage 4x 2TB Green HDD BTRFS RAID1
Display Sony Bravia 32"
find quote
zag Offline
Team-Kodi Member
Posts: 1,681
Joined: Oct 2007
Reputation: 20
Location: UK
Post: #11
I'll be looking into some heuristic search methods soon for http://www.theaudiodb.com API in the hope of improving this scraper.

Can people list some examples of how their music videos are named please?

HTPC - XBMC Gotham, OpenELEC, Harmony Smart Remote, Intel Haswell NUC, 40gb intel SSD, Core i3, 4gb RAM
Storage - 2 x qnap 8tb 419p+ NAS
Display LG 46" LCD + Casio Bulbless projector [PICS]
[Image: widget]
find quote
bradford9999 Offline
Junior Member
Posts: 5
Joined: Mar 2013
Reputation: 1
Post: #12
When I tried to import my music videos, I noticed that the current scraper has an exact way of handling filenames.

It MUST be ARTIST - SONGNAME.ext.

Note the "space dash space" in between artist and songname. If there aren't spaces surrounding the dash, it won't work.

Beastie Boys - Sabatoge.mpg -- works
Beastie Boys-Sabatoge.mpg -- fails

Note: you must have the exact artist and song, spelled correctly, including punctuation and spacing.

panic at the disco - i write sins not tragedies.mpg -- fails
panic! at the disco - i write sins not tragedies.mpg -- works

Taking back sunday - This photograph is proof.mpg -- fails
Taking back sunday - This photograph is proof(I know you know).mpg -- fails
Taking back sunday - This photograph is proof (I know you know).mpg -- works

Problems with the scraper:
  • Can't search when artist or song name has a dash
  • Can't search if songname is all numbers

Example:
the all-american rejects - move along.mpg -- fails
Bowling for Soup - 1985.mpg -- fails.

I haven't figured out a way around these two errors. I suspect you'll also get an error if the artist name is all numeric, but I can't be certain.

It is NOT an issue with Audiodb's scraping service. If you were to search for my above examples, it would return json:
http://www.theaudiodb.com/api/v1/json/1/...oup&t=1985
http://www.theaudiodb.com/api/v1/json/1/...ve%20along

XBMC'x addon is not parsing the values correctly when it finds all numeric songs or dashes in the artist name or song.

Hopefully this is helpful to someone. I'm going to continue finding out if there is a way around this.
Maybe change the separator to a pipe (|)? I don't imagine many songs/artists have that in their name.

Finally---if the scraper fails to find information, the error displayed will be "Cannot connect to server." Misleading, because it could connect, it just couldn't find your files based on what it parsed.
(This post was last modified: 2013-03-27 14:43 by bradford9999.)
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #13
The issue with dashes is a fairly easy (two-fold) fix.

The first issue is a small piece of code in XBMC, that handles how the "title" of a video is passed to the scraper's CreateSearchUrl (for music videos the title should just be the filename). For TV shows and Movies it's done in two passes, the first pass leaves dashes intact, if that fails to return any results the second pass is done with the dashes replaced by spaces. For Music Videos, it basically jumps straight to the second pass, removing the dashes straight away. (See this line in Scraper.cpp).

Removing that restriction in the code will only get you halfway there, though. Because that restriction exists, the scraper doesn't actually split the file name into artist and song by looking for the "space dash space", it splits it on "space space space", because the dash has been replaced before the scraper sees it (you can test this yourself by renaming a file to have three spaces between artist and song - it will still work). Without updating the scraper also, all that will happen is that the scraper will look for the triple space on the first pass and fail to find it, and so fall back to the second pass, resulting in identical behaviour.

Basically, the "Or is a music video" code needs removing from Scraper.cpp and the scraper needs updating to split on "space dash space" (%20-%20) (and "space space space" for backwards compatibility).
find quote
bradford9999 Offline
Junior Member
Posts: 5
Joined: Mar 2013
Reputation: 1
Post: #14
This is great! Thanks for the tips!

Is this something that we as users can contribute to, or is the repo locked to admins? I'm new to the XBMC community, so I'm not sure how to fix this for everyone, or if we are just supposed to change this locally.
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #15
If you can compile your own copy of XBMC, you can edit the code (just remove the " || Content() == CONTENT_MUSICVIDEOS") and then also replace the %20%20%20 with %20-%20 in your local copy of the scraper xml.

Note: I haven't actually tested this yet, but it seems solid. When I do get the chance to test it, (if it works) I'll do a pull request, and when/if that gets pulled, I will update the scraper accordingly for everyone (but obviously you'd need to be running a nightly build from after the pull request gets pulled for it to work properly).
find quote
Post Reply