XBMC Community Forum
How to get unicode from python to $INFO label - Printable Version

+- XBMC Community Forum (http://forum.xbmc.org)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Python Add-on Development (/forumdisplay.php?fid=26)
+--- Thread: How to get unicode from python to $INFO label (/showthread.php?tid=123689)

Pages: 1 2


How to get unicode from python to $INFO label - bossanova808 - 2012-02-23 07:17

I have some code that uses this string:

Code:
u'Sigur R\xc3\xb3s'

(python repr())

...which would appear a to be a utf-8 encoded unicode string (Although I ma very weak in this area!)

and I am setting that to a window property via:

Code:
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

(in a WindowXML)

I suspect I am going wrong somewhere basic but an arvo of researching various encoding things has got me no closer...

anyone have ideas??

...however, this results in gobbledygook on screen.


- VictorV - 2012-02-23 21:30

Try to convert it to a bytestring

s = u'Sigur R\xc3\xb3s'.encode('utf-8')


- bossanova808 - 2012-02-25 02:19

Unfortunately that doesn't work...same result.

Any other ideas - I think the info IS unicode utf-8, but I think maybe XBMC isn't interpreting it as such


- bossanova808 - 2012-02-25 02:24

Hmmm ok passing it just artist = 'Sigur R\xc3\xb3s' WITHOUT making it a uncide string works!

That's odd...must be a double translation thing I guess?

Now, how to get the unciode strings into basic string in python - i.e. cast them I guess. I find this area a bit confusing....


- bossanova808 - 2012-02-25 02:36

The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.

If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...

I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!

Any python experts know how to do this??


- giftie - 2012-02-25 04:48

bossanova808 Wrote:The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.

If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...

I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!

Any python experts know how to do this??

I thought it looked like a unicoded utf-8 string...

I use the following python code to insure that the string is in utf-8 coding.
Code:
def get_unicode( to_decode ):
    final = []
    try:
        temp_string = to_decode.encode('utf8')
        return to_decode
    except:
        while True:
            try:
                final.append(to_decode.decode('utf8'))
                break
            except UnicodeDecodeError, exc:
                # everything up to crazy character should be good
                final.append(to_decode[:exc.start].decode('utf8'))
                # crazy character is probably latin1
                final.append(to_decode[exc.start].decode('latin1'))
                # remove already encoded stuff
                to_decode = to_decode[exc.start+1:]
        return "".join(final)

Then I send to XBMC the string with a '.decode("utf-8")' This shows the artist in the proper format(usually..)


- bossanova808 - 2012-02-25 05:09

mmm, that seemed to give me the same results. This might make it clearer (perhaps)!

Code:
title, artist, album = self.player.getCurrentTrack()
    print "artist (raises exception about ordinal out of range if printed as is) "
    print repr(artist)
    artist2 = 'Sigur R\xc3\xb3s'
    print "artist2 is " + artist2
    print type(artist2)

    #newa =self.get_unicode(artist)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

and output:

Code:
14:06:58 T:756  NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756  NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756  NOTICE: artist2 is Sigur Rós
14:06:58 T:756  NOTICE: <type 'str'>

If I pass artist 2 - correct onscreen display

pass artist 1 - gobbldeygook


- giftie - 2012-02-25 06:12

What's the code in self.player.getCurrentTrack() I think the problem is there. With out the u' prefix it properly works, as you say, but nothing seems to be able to strip out.

bossanova808 Wrote:mmm, that seemed to give me the same results. This might make it clearer (perhaps)!

Code:
title, artist, album = self.player.getCurrentTrack()
    print "artist (raises exception about ordinal out of range if printed as is) "
    print repr(artist)
    artist2 = 'Sigur R\xc3\xb3s'
    print "artist2 is " + artist2
    print type(artist2)

    #newa =self.get_unicode(artist)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
    xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)

and output:

Code:
14:06:58 T:756  NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756  NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756  NOTICE: artist2 is Sigur Rós
14:06:58 T:756  NOTICE: <type 'str'>

If I pass artist 2 - correct onscreen display

pass artist 1 - gobbldeygook



- bossanova808 - 2012-02-25 06:18

Code:
artist = self.playlist[currentIndex]['artist']

...which is looking at the result of getplaylist:

    self.playlist = self.sb.playlist_get_info()

...

    def playlist_get_info(self):
        """Get info about the tracks in the current playlist"""
        amount = self.playlist_track_count()
        response = self.request('status 0 %i' % amount, True)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            item = {}
            for info in data:
                info = info.split(':')
                key = info.pop(0)
                if key:
                    item[key] = ':'.join(info)
            item['position'] = int(item['position'])
            item['id'] = int(item['id'])
            item['duration'] = float(item['duration'])
            playlist.append(item)
        return playlist

and __unquote is:

    def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote (text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.unquote(text)

(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).

I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.

I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.


- giftie - 2012-02-25 07:33

I know you really don't want to change the coding, but can you change the response line to the following:
Code:
response = self.request('status 0 %i' % amount, False)

bossanova808 Wrote:
Code:
artist = self.playlist[currentIndex]['artist']

...which is looking at the result of getplaylist:

    self.playlist = self.sb.playlist_get_info()

...

    def playlist_get_info(self):
        """Get info about the tracks in the current playlist"""
        amount = self.playlist_track_count()
        response = self.request('status 0 %i' % amount, True)
        encoded_list = response.split('playlist%20index')[1:]
        playlist = []
        for encoded in encoded_list:
            data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
            item = {}
            for info in data:
                info = info.split(':')
                key = info.pop(0)
                if key:
                    item[key] = ':'.join(info)
            item['position'] = int(item['position'])
            item['id'] = int(item['id'])
            item['duration'] = float(item['duration'])
            playlist.append(item)
        return playlist

and __unquote is:

    def __unquote(self, text):
        try:
            import urllib.parse
            return urllib.parse.unquote (text, encoding=self.charset)
        except ImportError:
            import urllib
            return urllib.unquote(text)

(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).

I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.

I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.