Encoding issue

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Larry_Lobster Offline
Member
Posts: 95
Joined: Oct 2010
Reputation: 0
Post: #1
Hi everyone,

I have a problem concerning the encoding of the web site my scraper parses:

This is the meta tag of the web site:
[HTML]<meta http-equiv="content-type" content="text/html; charset=iso-8859-15" />
[/HTML]
There are some umlauts hwich are not read correctly: Those which are escaped by using html entities like auml; work perfectly. Unfortunatle there ae some characters which have not been correctly escaped by the website, so there is a ä (Ascii hex E4) directly in the source code.

This character is not read correctly ifmy result xml is utf-8 encoded. If I return a iso-8859-15 encoded docuement the ä character is isplayed correctly, but the html entities are broken.

Is there a way to convert the encoding or can it be done by xbmc automatically? Any other ideas how to solve this?

Kind regards
Larry_Lobster
find quote
Larry_Lobster Offline
Member
Posts: 95
Joined: Oct 2010
Reputation: 0
Post: #2
Same topic, different question: Is it possible to not to use utf-8 for the url-encoding? The site I want to parse doe not support utf-8 encoded urls.

I hope there will be an answer this time. Otherwise it would be great, if you can tell me, if you don't understand the question, the question is too dumb to answer to oder you just don't know the answer. Cool
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,181
Joined: Nov 2003
Reputation: 82
Post: #3
yes you can. e.g.
Code:
<CreateSearchUrl SearchStringEncoding="iso-8859-15" dest="3">
this will also be used for conversions when we replace the html chars, so it should solve both your issues. remember to mark the returned xml as iso!

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Larry_Lobster Offline
Member
Posts: 95
Joined: Oct 2010
Reputation: 0
Post: #4
spiff Wrote:yes you can. e.g.
Code:
<CreateSearchUrl SearchStringEncoding="iso-8859-15" dest="3">
this will also be used for conversions when we replace the html chars, so it should solve both your issues. remember to mark the returned xml as iso!

Thank you some much, spiff! I will try this evening. This is an undocumented attribute, isn't it? If googled so much concerning this and had no success. The wiki didn't answer by question, too.
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,181
Joined: Nov 2003
Reputation: 82
Post: #5
lol. is any of this documented? Big Grin

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Larry_Lobster Offline
Member
Posts: 95
Joined: Oct 2010
Reputation: 0
Post: #6
spiff Wrote:lol. is any of this documented? Big Grin

hehe, can you tell me where I can find the parse algorithm of the scrapers in xbmc source code? Maybe I can use it to see which features (tags/attributes) are available.
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,181
Joined: Nov 2003
Reputation: 82
Post: #7
xbmc/utils/ScraperParser.cpp is the parser, xbmc/addons/Scraper.cpp the fluff that uses the parser.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Larry_Lobster Offline
Member
Posts: 95
Joined: Oct 2010
Reputation: 0
Post: #8
spiff Wrote:xbmc/utils/ScraperParser.cpp is the parser, xbmc/addons/Scraper.cpp the fluff that uses the parser.

Thanks!

spiff Wrote:yes you can. e.g.
Code:
<CreateSearchUrl SearchStringEncoding="iso-8859-15" dest="3">
this will also be used for conversions when we replace the html chars, so it should solve both your issues. remember to mark the returned xml as iso!

You've been right! Everything works fine now.
find quote