Convert HTML language characters to UTF8
#1
Hi.
How can I simple convert string containing special characters like ó to ó?
Thanks.
Reply
#2
I found this: here: http://www.php2python.com/wiki/function....ty-decode/ but I've not tried yet

Code:
import htmlentitydefs
import re

pattern = re.compile("&(\w+?);")

def html_entity_decode_char(m, defs=htmlentitydefs.entitydefs):
    try:
        return defs[m.group(1)]
    except KeyError:
        return m.group(0)

def html_entity_decode(string):
    return pattern.sub(html_entity_decode_char, string)

print html_entity_decode("&lt;spam&amp;eggs&gt;")  # <spam&eggs>
Reply
#3
Thanks.
I use HTMLParser like this:
Code:
import HTMLParser

self.parseHtml = HTMLParser.HTMLParser()
self.movieTitle = self.parseHtml.unescape(unicode(matchesTitle[0],'utf-8'))
Reply

Logout Mark Read Team Forum Stats Members Help
Convert HTML language characters to UTF80