How to get unicode from python to $INFO label - bossanova808 - 2012-02-23
I have some code that uses this string:
(python repr())
...which would appear a to be a utf-8 encoded unicode string (Although I ma very weak in this area!)
and I am setting that to a window property via:
Code: xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)
(in a WindowXML)
I suspect I am going wrong somewhere basic but an arvo of researching various encoding things has got me no closer...
anyone have ideas??
...however, this results in gobbledygook on screen.
- VictorV - 2012-02-23
Try to convert it to a bytestring
s = u'Sigur R\xc3\xb3s'.encode('utf-8')
- bossanova808 - 2012-02-25
Unfortunately that doesn't work...same result.
Any other ideas - I think the info IS unicode utf-8, but I think maybe XBMC isn't interpreting it as such
- bossanova808 - 2012-02-25
Hmmm ok passing it just artist = 'Sigur R\xc3\xb3s' WITHOUT making it a uncide string works!
That's odd...must be a double translation thing I guess?
Now, how to get the unciode strings into basic string in python - i.e. cast them I guess. I find this area a bit confusing....
- bossanova808 - 2012-02-25
The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.
If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...
I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!
Any python experts know how to do this??
- giftie - 2012-02-25
bossanova808 Wrote:The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.
If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...
I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!
Any python experts know how to do this??
I thought it looked like a unicoded utf-8 string...
I use the following python code to insure that the string is in utf-8 coding.
Code: def get_unicode( to_decode ):
final = []
try:
temp_string = to_decode.encode('utf8')
return to_decode
except:
while True:
try:
final.append(to_decode.decode('utf8'))
break
except UnicodeDecodeError, exc:
# everything up to crazy character should be good
final.append(to_decode[:exc.start].decode('utf8'))
# crazy character is probably latin1
final.append(to_decode[exc.start].decode('latin1'))
# remove already encoded stuff
to_decode = to_decode[exc.start+1:]
return "".join(final)
Then I send to XBMC the string with a '.decode("utf-8")' This shows the artist in the proper format(usually..)
- bossanova808 - 2012-02-25
mmm, that seemed to give me the same results. This might make it clearer (perhaps)!
Code: title, artist, album = self.player.getCurrentTrack()
print "artist (raises exception about ordinal out of range if printed as is) "
print repr(artist)
artist2 = 'Sigur R\xc3\xb3s'
print "artist2 is " + artist2
print type(artist2)
#newa =self.get_unicode(artist)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)
and output:
Code: 14:06:58 T:756 NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756 NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756 NOTICE: artist2 is Sigur Rós
14:06:58 T:756 NOTICE: <type 'str'>
If I pass artist 2 - correct onscreen display
pass artist 1 - gobbldeygook
- giftie - 2012-02-25
What's the code in self.player.getCurrentTrack() I think the problem is there. With out the u' prefix it properly works, as you say, but nothing seems to be able to strip out.
bossanova808 Wrote:mmm, that seemed to give me the same results. This might make it clearer (perhaps)!
Code: title, artist, album = self.player.getCurrentTrack()
print "artist (raises exception about ordinal out of range if printed as is) "
print repr(artist)
artist2 = 'Sigur R\xc3\xb3s'
print "artist2 is " + artist2
print type(artist2)
#newa =self.get_unicode(artist)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)
and output:
Code: 14:06:58 T:756 NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756 NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756 NOTICE: artist2 is Sigur Rós
14:06:58 T:756 NOTICE: <type 'str'>
If I pass artist 2 - correct onscreen display
pass artist 1 - gobbldeygook
- bossanova808 - 2012-02-25
Code: artist = self.playlist[currentIndex]['artist']
...which is looking at the result of getplaylist:
self.playlist = self.sb.playlist_get_info()
...
def playlist_get_info(self):
"""Get info about the tracks in the current playlist"""
amount = self.playlist_track_count()
response = self.request('status 0 %i' % amount, True)
encoded_list = response.split('playlist%20index')[1:]
playlist = []
for encoded in encoded_list:
data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
item = {}
for info in data:
info = info.split(':')
key = info.pop(0)
if key:
item[key] = ':'.join(info)
item['position'] = int(item['position'])
item['id'] = int(item['id'])
item['duration'] = float(item['duration'])
playlist.append(item)
return playlist
and __unquote is:
def __unquote(self, text):
try:
import urllib.parse
return urllib.parse.unquote (text, encoding=self.charset)
except ImportError:
import urllib
return urllib.unquote(text)
(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).
I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.
I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
- giftie - 2012-02-25
I know you really don't want to change the coding, but can you change the response line to the following:
Code: response = self.request('status 0 %i' % amount, False)
bossanova808 Wrote:Code: artist = self.playlist[currentIndex]['artist']
...which is looking at the result of getplaylist:
self.playlist = self.sb.playlist_get_info()
...
def playlist_get_info(self):
"""Get info about the tracks in the current playlist"""
amount = self.playlist_track_count()
response = self.request('status 0 %i' % amount, True)
encoded_list = response.split('playlist%20index')[1:]
playlist = []
for encoded in encoded_list:
data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
item = {}
for info in data:
info = info.split(':')
key = info.pop(0)
if key:
item[key] = ':'.join(info)
item['position'] = int(item['position'])
item['id'] = int(item['id'])
item['duration'] = float(item['duration'])
playlist.append(item)
return playlist
and __unquote is:
def __unquote(self, text):
try:
import urllib.parse
return urllib.parse.unquote (text, encoding=self.charset)
except ImportError:
import urllib
return urllib.unquote(text)
(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).
I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.
I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
- bossanova808 - 2012-02-25
Unfortunately that break the entire function...the data that comes back from the server looks like:
Code: response = self.request('status 0 %i' % amount, True)
print "response" + str(response)
encoded_list = response.split('playlist%20index')[1:]
playlist = []
for encoded in encoded_list:
print "encoded" + encoded
data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
print "data" + str(data)
20:08:06 T:5232 NOTICE: response1 player_name%3ASqueezeslave player_connected%3A1 player_ip%3A192.168.1.9%3A49712 power%3A1 signalstrength%3A0 mode%3Astop time%3A0 rate%3A1 duration%3A603.826 can_seek%3A1 mixer%20volume%3A50 playlist%20repeat%3A0 playlist%20shuffle%3A0 playlist%20mode%3Aoff seq_no%3A0 playlist_cur_index%3A1 playlist_timestamp%3A1330160627.81035 playlist_tracks%3A11 playlist%20index%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493 playlist%20index%3A1 id%3A11145 title%3ASvefn-g-englar genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A603.826 playlist%20index%3A2 id%3A11146 title%3AStar%C3%A1lfur genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A406.933 playlist%20index%3A3 id%3A11147 title%3AFlugufrelsarinn genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A467.84 playlist%20index%3A4 id%3A11148 title%3AN%C3%BD%20batter%C3%AD genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A489.533 playlist%20index%3A5 id%3A11149 title%3AHjarta%C3%B0%20hamast%20(bamm%20bamm%20bamm) genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A430.546 playlist%20index%3A6 id%3A11150 title%3AVi%C3%B0ar%20vel%20tl%20loft%C3%A1rasa genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A617.013 playlist%20index%3A7 id%3A11151 title%3AOlsen%20Olsen genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A484.24 playlist%20index%3A8 id%3A11152 title%3A%C3%81g%C3%A6tis%20byrjun genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A474.653 playlist%20index%3A9 id%3A11153 title%3AAvalon genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A246.146 playlist%20index%3A10 id%3A19959 title%3ASvefn-G-Englar genre%3APop artist%3ASigur%20R%C3%B3s album%3AThe%20Pitchfork%20500 duration%3A604.081
20:08:06 T:5232 NOTICE: encoded%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493
20:08:06 T:5232 NOTICE: data[u'position:0', u'id:11144', u'title:Intro', u'genre:Pop', u'artist:Sigur R\xc3\xb3s', u'album:\xc3\x81g\xc3\xa6tis byrjun', u'duration:100.493', u'']
- giftie - 2012-02-25
Found the problem.. It's a bug in the python urillib.unquote() module... -> http://bugs.python.org/issue8136.
Now to find the way to correct it...
The easiest is to modify the __unquote() in the server.py from:
Code: def __quote(self, text):
try:
import urllib.parse
return urllib.parse.quote(text, encoding=self.charset)
except ImportError:
import urllib
return urllib.quote(text)
TO
Code: def __quote(self, text):
try:
import urllib.parse
return urllib.parse.quote(text, encoding=self.charset)
except ImportError:
#import urllib
#return urllib.quote(text)
if isinstance(text, unicode):
text = text.encode('utf-8')
res = text.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)
This puts the patched code to fix the urllib.quote() in place of calling the urllib.quote() code.
- bossanova808 - 2012-02-26
That looks like some amazing searching and indeed this issue...
However, you seem to have modified __quote instead of __unquote - is that right?
I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....
- giftie - 2012-02-26
bossanova808 Wrote:That looks like some amazing searching and indeed this issue...
However, you seem to have modified __quote instead of __unquote - is that right?
I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....
yep my bad... It should be in the __unquote() section.
Heres the real code(found the missing _hextochr):
Code: def __unquote(self, text):
try:
import urllib.parse
return urllib.parse.unquote(text, encoding=self.charset)
except ImportError:
#import urllib
#return urllib.unquote(text)
_hexdig = '0123456789ABCDEFabcdef'
_hextochr = dict((a+b, chr(int(a+b,16))) for a in _hexdig for b in _hexdig)
if isinstance(text, unicode):
text = text.encode('utf-8')
res = text.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)
- bossanova808 - 2012-02-27
Give that man a cigar...
Yep, that works, and has the by-product of changing some other funky-ness in my code to something much simpler & neater.
Many many thanks mate, you went above and beyond.
|