Encoding issue
#1
Hi everyone,

I have a problem concerning the encoding of the web site my scraper parses:

This is the meta tag of the web site:
[HTML]<meta http-equiv="content-type" content="text/html; charset=iso-8859-15" />
[/HTML]
There are some umlauts hwich are not read correctly: Those which are escaped by using html entities like auml; work perfectly. Unfortunatle there ae some characters which have not been correctly escaped by the website, so there is a ä (Ascii hex E4) directly in the source code.

This character is not read correctly ifmy result xml is utf-8 encoded. If I return a iso-8859-15 encoded docuement the ä character is isplayed correctly, but the html entities are broken.

Is there a way to convert the encoding or can it be done by xbmc automatically? Any other ideas how to solve this?

Kind regards
Larry_Lobster
Reply
#2
Same topic, different question: Is it possible to not to use utf-8 for the url-encoding? The site I want to parse doe not support utf-8 encoded urls.

I hope there will be an answer this time. Otherwise it would be great, if you can tell me, if you don't understand the question, the question is too dumb to answer to oder you just don't know the answer. Cool
Reply
#3
yes you can. e.g.
Code:
<CreateSearchUrl SearchStringEncoding="iso-8859-15" dest="3">
this will also be used for conversions when we replace the html chars, so it should solve both your issues. remember to mark the returned xml as iso!
Reply
#4
spiff Wrote:yes you can. e.g.
Code:
<CreateSearchUrl SearchStringEncoding="iso-8859-15" dest="3">
this will also be used for conversions when we replace the html chars, so it should solve both your issues. remember to mark the returned xml as iso!

Thank you some much, spiff! I will try this evening. This is an undocumented attribute, isn't it? If googled so much concerning this and had no success. The wiki didn't answer by question, too.
Reply
#5
lol. is any of this documented? Big Grin
Reply
#6
spiff Wrote:lol. is any of this documented? Big Grin

hehe, can you tell me where I can find the parse algorithm of the scrapers in xbmc source code? Maybe I can use it to see which features (tags/attributes) are available.
Reply
#7
xbmc/utils/ScraperParser.cpp is the parser, xbmc/addons/Scraper.cpp the fluff that uses the parser.
Reply
#8
spiff Wrote:xbmc/utils/ScraperParser.cpp is the parser, xbmc/addons/Scraper.cpp the fluff that uses the parser.

Thanks!

spiff Wrote:yes you can. e.g.
Code:
<CreateSearchUrl SearchStringEncoding="iso-8859-15" dest="3">
this will also be used for conversions when we replace the html chars, so it should solve both your issues. remember to mark the returned xml as iso!

You've been right! Everything works fine now.
Reply

Logout Mark Read Team Forum Stats Members Help
Encoding issue0