Filmweb scraper

  Thread Rating:
  • 2 Votes - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #31
can someone review and then commit this to the SVN?

[Image: 1.png]
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #32
i have a script before every location url

ex.
details.1.html

how can i force scraper to skip this

smuto

[Image: 1.png]
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #33
i add "spoof" to url, mayby this help
u can test my wip scraper

filmweb.xml_test

[Image: 1.png]
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #34
spoof is for setting the referer. it probably does the trick indeed. sorry for the late response
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #35
maybe it's not xbmc problem, but maybe u can help

Recently in movie info from filmweb scraper, accented characters are show as a entities

ex.
latin small letter o with acute
ó -> ó

is the way to fix this
smuto

[Image: 1.png]
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #36
hmm, it should convert those tags when you load the xml?
if not, make sure cleaning is performed on the field. latter would remove them though
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #37
with or without "noclean" i still have this same

ex
xbmc shows
który -> ktoacute;ry

in .xml from scrap.exe
który -> który

realy don't know what to do

i need to update SVN (small url function link fix)
but for now,this one is good for testing entitie
filmweb.xml_test

good for test is "Kingdom of the Crystal Skull"
tag title is OK
tags outline & plot are wrong

[Image: 1.png]
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #38
for myself i edit source file HTMLUtil.cpp
edited HTMLUtil.cpp
Code:
strReturn.Replace("–", "-;");
  strReturn.Replace("ó", "ó");

it's working, but i hope u help to fix this for all polish users

smuto

[Image: 1.png]
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #39
i add fanart to filmweb scraper

i use polish wikipedia to migration from filmweb.id to imdb.id

we still have problem with entities, hope spiff find time to help us

u can test new scraper from here
filmweb.xml_test_scraper

smuto

[Image: 1.png]
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #40
hi.

i see nothing wrong, nor any other way to handle this so i just commited your replaces along with the new scraper. please use trac in the futureSmile

spiff
find quote
haken Offline
Donor
Posts: 59
Joined: Jan 2008
Reputation: 0
Location: Poland - Krakow
Post: #41
@smuto: There are some problems with titles that start with numbers eg. "1410" or "27 dresses" - numbers are cut off from them. Fanart support is really great.
I hope that xbmc compilation with edited HTMLutil.cpp will be ready soon. At this moment you could put your compiled xbmc default.xbe at smuto.w.interia.pl (would be great for me, because i want to rescan my movie library and polish plots with no entity problems is something I look for...)
find quote
haken Offline
Donor
Posts: 59
Joined: Jan 2008
Reputation: 0
Location: Poland - Krakow
Post: #42
Eventhough entities has been fixed with changeset 15625 it seems that "oacute problem" still exists (I checked filmweb scraper on xbmc compilations 15640 and 15728). Smuto - do you agree with me?
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #43
@haken - u just need to update scraper
filmweb.xml

@spiff
Quote:i see nothing wrong, nor any other way to handle this so i just commited your replaces
but this is not a good idea - "oacute" & "ndash" are most popular
this mean i should add all entities to replaces
next in my queue are
Code:
strReturn.Replace(" ", "");
  strReturn.Replace("’", "'");
smuto

[Image: 1.png]
find quote
smuto Offline
Senior Member
Posts: 242
Joined: Sep 2004
Reputation: 2
Post: #44
i don't know why, but sometimes wikipedia search don't work

i change the way of scraping the link after search - please test
filmweb.xml

is the way to show in skin custom label?

something like this i need for testing
ListItem.IMDbID or ListItem.FilmwebID

smuto

[Image: 1.png]
find quote
haken Offline
Donor
Posts: 59
Joined: Jan 2008
Reputation: 0
Location: Poland - Krakow
Post: #45
@smuto: I think that there are some changes in filmweb.pl website - descriptions cannot be scraped and high-res posters also. I looked inside the scraper, but it is to complicated for meWink

Update: Scraper is ok! It was something else - now everything works perfect. I was surprised because each time earlier scraper worked or didn't work at all... Sorry!
(This post was last modified: 2008-11-01 20:16 by haken.)
find quote
Post Reply