I have some code that uses this string:
(python repr())
...which would appear a to be a utf-8 encoded unicode string (Although I ma very weak in this area!)
and I am setting that to a window property via:
Code:
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)
(in a WindowXML)
I suspect I am going wrong somewhere basic but an arvo of researching various encoding things has got me no closer...
anyone have ideas??
...however, this results in gobbledygook on screen.
Try to convert it to a bytestring
s = u'Sigur R\xc3\xb3s'.encode('utf-8')
Unfortunately that doesn't work...same result.
Any other ideas - I think the info IS unicode utf-8, but I think maybe XBMC isn't interpreting it as such
Hmmm ok passing it just artist = 'Sigur R\xc3\xb3s' WITHOUT making it a uncide string works!
That's odd...must be a double translation thing I guess?
Now, how to get the unciode strings into basic string in python - i.e. cast them I guess. I find this area a bit confusing....
The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.
If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...
I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!
Any python experts know how to do this??
bossanova808 Wrote:The problem is I am using a downstream library and is returning strings with these characters in them, so 'Sigur R\xc3\xb3s' - and these are type as unicode.
If I then pass them as this type, they come out in xbmc wonky. I need to just cast them or get the literal value of the string...but I can't seem to just get the literal value from a unicode string in a variable...
I think I am missing something obvious but have been missing it for two days now and it's driving me nuts!
Any python experts know how to do this??
I thought it looked like a unicoded utf-8 string...
I use the following python code to insure that the string is in utf-8 coding.
Code:
def get_unicode( to_decode ):
final = []
try:
temp_string = to_decode.encode('utf8')
return to_decode
except:
while True:
try:
final.append(to_decode.decode('utf8'))
break
except UnicodeDecodeError, exc:
# everything up to crazy character should be good
final.append(to_decode[:exc.start].decode('utf8'))
# crazy character is probably latin1
final.append(to_decode[exc.start].decode('latin1'))
# remove already encoded stuff
to_decode = to_decode[exc.start+1:]
return "".join(final)
Then I send to XBMC the string with a '.decode("utf-8")' This shows the artist in the proper format(usually..)
mmm, that seemed to give me the same results. This might make it clearer (perhaps)!
Code:
title, artist, album = self.player.getCurrentTrack()
print "artist (raises exception about ordinal out of range if printed as is) "
print repr(artist)
artist2 = 'Sigur R\xc3\xb3s'
print "artist2 is " + artist2
print type(artist2)
#newa =self.get_unicode(artist)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)
and output:
Code:
14:06:58 T:756 NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756 NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756 NOTICE: artist2 is Sigur Rós
14:06:58 T:756 NOTICE: <type 'str'>
If I pass artist 2 - correct onscreen display
pass artist 1 - gobbldeygook
What's the code in self.player.getCurrentTrack() I think the problem is there. With out the u' prefix it properly works, as you say, but nothing seems to be able to strip out.
bossanova808 Wrote:mmm, that seemed to give me the same results. This might make it clearer (perhaps)!
Code:
title, artist, album = self.player.getCurrentTrack()
print "artist (raises exception about ordinal out of range if printed as is) "
print repr(artist)
artist2 = 'Sigur R\xc3\xb3s'
print "artist2 is " + artist2
print type(artist2)
#newa =self.get_unicode(artist)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTTITLE", title)
xbmcgui.Window(xbmcgui.getCurrentWindowId()).setProperty("CURRENTARTIST", artist)
and output:
Code:
14:06:58 T:756 NOTICE: artist (raises exception about ordinal out of range if printed as is)
14:06:58 T:756 NOTICE: u'Sigur R\xc3\xb3s'
14:06:58 T:756 NOTICE: artist2 is Sigur Rós
14:06:58 T:756 NOTICE: <type 'str'>
If I pass artist 2 - correct onscreen display
pass artist 1 - gobbldeygook
Code:
artist = self.playlist[currentIndex]['artist']
...which is looking at the result of getplaylist:
self.playlist = self.sb.playlist_get_info()
...
def playlist_get_info(self):
"""Get info about the tracks in the current playlist"""
amount = self.playlist_track_count()
response = self.request('status 0 %i' % amount, True)
encoded_list = response.split('playlist%20index')[1:]
playlist = []
for encoded in encoded_list:
data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
item = {}
for info in data:
info = info.split(':')
key = info.pop(0)
if key:
item[key] = ':'.join(info)
item['position'] = int(item['position'])
item['id'] = int(item['id'])
item['duration'] = float(item['duration'])
playlist.append(item)
return playlist
and __unquote is:
def __unquote(self, text):
try:
import urllib.parse
return urllib.parse.unquote (text, encoding=self.charset)
except ImportError:
import urllib
return urllib.unquote(text)
(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).
I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.
I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
I know you really don't want to change the coding, but can you change the response line to the following:
Code:
response = self.request('status 0 %i' % amount, False)
bossanova808 Wrote:Code:
artist = self.playlist[currentIndex]['artist']
...which is looking at the result of getplaylist:
self.playlist = self.sb.playlist_get_info()
...
def playlist_get_info(self):
"""Get info about the tracks in the current playlist"""
amount = self.playlist_track_count()
response = self.request('status 0 %i' % amount, True)
encoded_list = response.split('playlist%20index')[1:]
playlist = []
for encoded in encoded_list:
data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
item = {}
for info in data:
info = info.split(':')
key = info.pop(0)
if key:
item[key] = ':'.join(info)
item['position'] = int(item['position'])
item['id'] = int(item['id'])
item['duration'] = float(item['duration'])
playlist.append(item)
return playlist
and __unquote is:
def __unquote(self, text):
try:
import urllib.parse
return urllib.parse.unquote (text, encoding=self.charset)
except ImportError:
import urllib
return urllib.unquote(text)
(it does raise the exception and fo through ro just urllib.unquote(text) rather than the .parse version).
I wrote basically none of those functions, they are from pysqueezecenter and I use this in lots of places, so ideally I want to fix it externally if I can...as if I change the output it will likely break other things.
I even tried using repr() on it and then stripping off the u' and the final ' in a gross hack but that didn't work...which surprised me.
Unfortunately that break the entire function...the data that comes back from the server looks like:
Code:
response = self.request('status 0 %i' % amount, True)
print "response" + str(response)
encoded_list = response.split('playlist%20index')[1:]
playlist = []
for encoded in encoded_list:
print "encoded" + encoded
data = [self.__unquote(x) for x in ('position' + encoded).split(' ')]
print "data" + str(data)
20:08:06 T:5232 NOTICE: response1 player_name%3ASqueezeslave player_connected%3A1 player_ip%3A192.168.1.9%3A49712 power%3A1 signalstrength%3A0 mode%3Astop time%3A0 rate%3A1 duration%3A603.826 can_seek%3A1 mixer%20volume%3A50 playlist%20repeat%3A0 playlist%20shuffle%3A0 playlist%20mode%3Aoff seq_no%3A0 playlist_cur_index%3A1 playlist_timestamp%3A1330160627.81035 playlist_tracks%3A11 playlist%20index%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493 playlist%20index%3A1 id%3A11145 title%3ASvefn-g-englar genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A603.826 playlist%20index%3A2 id%3A11146 title%3AStar%C3%A1lfur genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A406.933 playlist%20index%3A3 id%3A11147 title%3AFlugufrelsarinn genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A467.84 playlist%20index%3A4 id%3A11148 title%3AN%C3%BD%20batter%C3%AD genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A489.533 playlist%20index%3A5 id%3A11149 title%3AHjarta%C3%B0%20hamast%20(bamm%20bamm%20bamm) genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A430.546 playlist%20index%3A6 id%3A11150 title%3AVi%C3%B0ar%20vel%20tl%20loft%C3%A1rasa genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A617.013 playlist%20index%3A7 id%3A11151 title%3AOlsen%20Olsen genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A484.24 playlist%20index%3A8 id%3A11152 title%3A%C3%81g%C3%A6tis%20byrjun genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A474.653 playlist%20index%3A9 id%3A11153 title%3AAvalon genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A246.146 playlist%20index%3A10 id%3A19959 title%3ASvefn-G-Englar genre%3APop artist%3ASigur%20R%C3%B3s album%3AThe%20Pitchfork%20500 duration%3A604.081
20:08:06 T:5232 NOTICE: encoded%3A0 id%3A11144 title%3AIntro genre%3APop artist%3ASigur%20R%C3%B3s album%3A%C3%81g%C3%A6tis%20byrjun duration%3A100.493
20:08:06 T:5232 NOTICE: data[u'position:0', u'id:11144', u'title:Intro', u'genre:Pop', u'artist:Sigur R\xc3\xb3s', u'album:\xc3\x81g\xc3\xa6tis byrjun', u'duration:100.493', u'']
Found the problem.. It's a bug in the python urillib.unquote() module... ->
http://bugs.python.org/issue8136.
Now to find the way to correct it...
The easiest is to modify the __unquote() in the server.py from:
Code:
def __quote(self, text):
try:
import urllib.parse
return urllib.parse.quote(text, encoding=self.charset)
except ImportError:
import urllib
return urllib.quote(text)
TO
Code:
def __quote(self, text):
try:
import urllib.parse
return urllib.parse.quote(text, encoding=self.charset)
except ImportError:
#import urllib
#return urllib.quote(text)
if isinstance(text, unicode):
text = text.encode('utf-8')
res = text.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)
This puts the patched code to fix the urllib.quote() in place of calling the urllib.quote() code.
That looks like some amazing searching and indeed this issue...
However, you seem to have modified __quote instead of __unquote - is that right?
I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....
bossanova808 Wrote:That looks like some amazing searching and indeed this issue...
However, you seem to have modified __quote instead of __unquote - is that right?
I tried it as __unquote (change the name and the call to __unquote) - I am currently stuck on _hextochr not being recognised....
yep my bad... It should be in the __unquote() section.
Heres the real code(found the missing _hextochr):
Code:
def __unquote(self, text):
try:
import urllib.parse
return urllib.parse.unquote(text, encoding=self.charset)
except ImportError:
#import urllib
#return urllib.unquote(text)
_hexdig = '0123456789ABCDEFabcdef'
_hextochr = dict((a+b, chr(int(a+b,16))) for a in _hexdig for b in _hexdig)
if isinstance(text, unicode):
text = text.encode('utf-8')
res = text.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)
Give that man a cigar...
Yep, that works, and has the by-product of changing some other funky-ness in my code to something much simpler & neater.
Many many thanks mate, you went above and beyond.