Some improvements for lyricsmode

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
chninkel Offline
Junior Member
Posts: 2
Joined: Mar 2012
Reputation: 0
Wink  Some improvements for lyricsmode Post: #1
Hi,

I am currently using the script cu.lyrics with lyricsmode and I am very happy with it however sometimes it doesn't seem to find the lyrics although I can find them using the website.

I slightly modified the scraper code so it uses the search box from lyrics mode if the direct url guessing didn't work.
It can often be the case if the song or artist have not been written exactly like lyricsmode stored it in its database (the cranberries vs cranberries) of because it contains some special characters (k's choice).

The modifications I applied are shown at the end of this post (didn't find how to attach a file). Would it be possible to commit them to the cu.lyrics code ?

Thanks in advance,

Yann


Code:
diff -ur script.cu.lyrics.orig/resources/lib/scrapers/lyricsmode/lyricsScraper.py script.cu.lyrics/resources/lib/scrapers/lyricsmode/lyricsScraper.py
--- script.cu.lyrics.orig/resources/lib/scrapers/lyricsmode/lyricsScraper.py    2012-04-01 20:13:54.106691515 +0200
+++ script.cu.lyrics/resources/lib/scrapers/lyricsmode/lyricsScraper.py    2012-04-08 23:33:00.699950122 +0200
@@ -139,6 +139,8 @@
         self.clean_lyrics_regex = re.compile( "<.+?>" )
         self.normalize_lyrics_regex = re.compile( "&#[x]*(?P<name>[0-9]+);*" )
         self.clean_br_regex = re.compile( "<br[ /]*>[\s]*", re.IGNORECASE )
+        self.search_results_regex = re.compile("<a href=\"[^\"]+\">([^<]+)</a></td>[^<]+<td><a href=\"([^\"]+)\" class=\"b\">[^<]+</a></td>", re.IGNORECASE)
+        self.next_results_regex = re.compile("<A href=\"([^\"]+)\" class=\"pages\">next .</A>", re.IGNORECASE)
    
     def get_lyrics_start(self, *args):
         lyricThread = threading.Thread(target=self.get_lyrics_thread, args=args)
@@ -151,8 +154,36 @@
         l.song = song
         try: # below is borowed from XBMC Lyrics
             url = "http://www.lyricsmode.com/lyrics/%s/%s/%s.html" % (song.artist.lower()[:1],song.artist.lower().replace(" ","_"), song.title.lower().replace(" ","_"), )
-            print "Search url: %s" % (url)
-            song_search = urllib.urlopen(url).read()
+
+            while True:
+                print "Search url: %s" % (url)
+                song_search = urllib.urlopen(url).read()
+                if song_search.find("<div id='songlyrics_h' class='dn'>") >= 0:
+                        break
+
+                # Let's try to use the research box if we didn't yet
+                if not 'search' in url:
+                    url = "http://www.lyricsmode.com/search.php?what=songs&s=" + urllib.quote_plus(song.title.lower())
+                else:
+                    # the search gave several results, let's try to find our song
+                    url = ""
+                    start = song_search.find('<!--output-->')
+                    end = song_search.find('<!--/output-->', start)
+                    results = self.search_results_regex.findall(song_search, start, end)
+
+                    for result in results:
+                        if result[0].lower() in song.artist.lower():
+                            url = "http://www.lyricsmode.com" + result[1]
+                            break
+
+                    if not url:
+                        # Is there a next page of results ?
+                        match = self.next_results_regex.search(song_search[end:])
+                        if match:
+                            url = "http://www.lyricsmode.com/search.php" + match.group(1)
+                        else:
+                            return None, "No lyrics found"
+
             lyr = song_search.split("<div id='songlyrics_h' class='dn'>")[1].split('<!-- /SONG LYRICS -->')[0]
             lyr = self.clean_br_regex.sub( "\n", lyr ).strip()
             lyr = self.clean_lyrics_regex.sub( "", lyr ).strip()
diff -ur script.cu.lyrics.orig/resources/lib/song.py script.cu.lyrics/resources/lib/song.py
--- script.cu.lyrics.orig/resources/lib/song.py    2012-04-01 20:13:54.158691515 +0200
+++ script.cu.lyrics/resources/lib/song.py    2012-04-08 16:56:32.617536591 +0200
@@ -30,7 +30,9 @@
     def current():
         song = Song()
         song.title = xbmc.getInfoLabel( "MusicPlayer.Title" )
+        song.title = utilities.deAccent(song.title)
         song.artist = xbmc.getInfoLabel( "MusicPlayer.Artist")
+        song.artist = utilities.deAccent(song.artist)
        
         print "Current Song: %s:%s" % (song.artist, song.title)
find quote
DDDamian Offline
Team-XBMC Developer
Posts: 3,023
Joined: Sep 2011
Reputation: 252
Location: Canada
Post: #2
I'll make sure the right guys see it - thx for helping out Smile

System: XBMC HTPC with HDMI WASAPI & AudioEngine - Denon AVR-3808CI - Denon DVD-5900 Universal Player - Denon DCM-27 CD-Changer
- Sony BDP-S580 Blu-Ray - X-Box 360 - Android tablet wireless remote - 7.1 Streem/Axiom/Velodyne Surround System
If I have been able to help feel free to add to my reputation +/- below - thanks!
find quote
amet Offline
I wave my private parts at your aunties!
Posts: 3,485
Joined: Jul 2009
Reputation: 18
Location: Novi Sad / Dubai
Post: #3
please check that this is fine https://github.com/amet/script.cu.lyrics...d9b70f81e0 before it goes out, I had to manually apply the patch


@chninkel
if you want to, next time just submit the pull request on github and i'll get it in


Always read the XBMC_Online_Manual,Frequently_Asked_Questions and search the forum before posting.
For troubleshooting and bug reporting use -> Log_file.
find quote
DDDamian Offline
Team-XBMC Developer
Posts: 3,023
Joined: Sep 2011
Reputation: 252
Location: Canada
Post: #4
Amet's the right guy Smile

I'll test and ping you in IRC for a blessing and update the thread here for chninkel.

System: XBMC HTPC with HDMI WASAPI & AudioEngine - Denon AVR-3808CI - Denon DVD-5900 Universal Player - Denon DCM-27 CD-Changer
- Sony BDP-S580 Blu-Ray - X-Box 360 - Android tablet wireless remote - 7.1 Streem/Axiom/Velodyne Surround System
If I have been able to help feel free to add to my reputation +/- below - thanks!
find quote
chninkel Offline
Junior Member
Posts: 2
Joined: Mar 2012
Reputation: 0
Post: #5
Yes it needs a little testing to be sure there is no problem.
I already have a slight modification to avoid a problem when the song page exists but doesn't contain the lyrics for some reason.

Code:
--- lyricsScraper.py.orig    2012-04-09 11:18:15.668261004 +0200
+++ lyricsScraper.py    2012-04-09 11:18:59.944261331 +0200
@@ -154,12 +154,18 @@
         l.song = song
         try: # below is borowed from XBMC Lyrics
             url = "http://www.lyricsmode.com/lyrics/%s/%s/%s.html" % (song.artist.lower()[:1],song.artist.lower().replace(" ","_"), song.title.lower().replace(" ","_"), )
+            lyrics_found = False
             while True:
                 print "Search url: %s" % (url)
                 song_search = urllib.urlopen(url).read()
                 if song_search.find("<div id='songlyrics_h' class='dn'>") >= 0:
-                        break
+                    break

+                if lyrics_found:
+                    # if we're here, we found the lyrics page but it didn't
+                    # contains the lyrics part (licensing issue or some bug)
+                    return None, "No lyrics found"
+                    
                 # Let's try to use the research box if we didn't yet
                 if not 'search' in url:
                     url = "http://www.lyricsmode.com/search.php?what=songs&s=" + urllib.quote_plus(song.title.lower())
@@ -173,6 +179,7 @@
                     for result in results:
                         if result[0].lower() in song.artist.lower():
                             url = "http://www.lyricsmode.com" + result[1]
+                            lyrics_found = True
                             break

                     if not url:

@amet: I will try to use git if I have more modifications to submit. I would like to propose some modifications to have the lyrics code work with radio songs (where the song title often contains the artist and the title in fact).
find quote
amet Offline
I wave my private parts at your aunties!
Posts: 3,485
Joined: Jul 2009
Reputation: 18
Location: Novi Sad / Dubai
Post: #6
yeah, please use github and submit the pull request there, it makes it much easier to review and incorporate. this doesn't apply cleanly on my side and it has to be done manually.

thx for the fixes and help Smile


Always read the XBMC_Online_Manual,Frequently_Asked_Questions and search the forum before posting.
For troubleshooting and bug reporting use -> Log_file.
(This post was last modified: 2012-04-09 11:56 by amet.)
find quote
DDDamian Offline
Team-XBMC Developer
Posts: 3,023
Joined: Sep 2011
Reputation: 252
Location: Canada
Post: #7
@chninkel - amet will push to git - thx for your work! As he mentions, go thru the pull-request system on git, and be sure to test all you can before submitting a PR.

You've now officially added to XBMC Smile

System: XBMC HTPC with HDMI WASAPI & AudioEngine - Denon AVR-3808CI - Denon DVD-5900 Universal Player - Denon DCM-27 CD-Changer
- Sony BDP-S580 Blu-Ray - X-Box 360 - Android tablet wireless remote - 7.1 Streem/Axiom/Velodyne Surround System
If I have been able to help feel free to add to my reputation +/- below - thanks!
find quote