Looking for the ultimate HTTP fetch function
#31
(phunck @ april 25 2005,12:42 Wrote:enderw: you mentioned clientcookie. i can see that this is equivalent to cookielib in py2.4, so this is what i need. however, can i rely on clientcookie being installed?
funny, was about to mention this but i see you've noticed it already. as for relying on it being installed, i am not sure what you mean by that. if you wonder wheter it will be installed by the script service, i am not sure. the plan was having libs in the script's folder and not in the lib part. otherwise you would need something to tell you what to put into lib and what to leave in the script's dir. afaik you can copy code from clientcookie to your own lib as long as the copyright notice is there...but i am really not sure. check out the bsd license at their site.

if the script service wasn't what you referred to then i've misunderstood completely...in that case, no you can't rely on it being installed Wink
xbmcscripts.com administrator
Reply
#32
Sad 
(phunck @ april 25 2005,09:36 Wrote:
Quote:can't the cache folder be placed somewhere on x:, y: or z:? aren't these partitions for temp file use?
they are for cache, but i wonder if it is only for game cache. does anybody have any experience with this? but a general logical location like that would be good. but maybe just q:\scripts\httpfetchcache\ ?
i believe the x: y: z: partitions are there for a reason.
if you store normal data and temp file on the same partition you can cause heavy fragmentation.
if you put all tempfiles on one single dedicated partition then can delete them all and don't fragment your normal files when creating and deleting temp files rapidly.

but i don't know wether the x,y,z drives are cleared up during reboot or can be used by as a temp folder for scripts.

can anybody comment on this?
maybe the xdk says something about it.
its definetly worth thinking about.

but if its not possible to place them on x,y,z then q:\scripts\httpfetchcache\ is quite good.

Quote:the permanent folder should probably be specified on a script by script basis. e.g. sometimes f:\videos\musicvideos might make sense.
why don't let the caller pass a complete path for permanent storage in 'localfile'. this way every script can specify where to put the file.

bernd
Reply
#33
my strategy for parsing html is to use a xml dom parser to parse the html into a dom document. then query the dom tree for the nodes you are interested in. some pages have mismatched tags... so i usually blank the mismatched tags using regular expressions until the page becomes well formed xml and can be parsed cleanly.

the other approach that i've used is to use regular expressions to pick out certain bits of information from the html. however, in my experience, it is very sensitive to change and i found myself maintaining the regular expressions on a frequent basis.
Reply
#34
about xyz partitions. i remember reading that xbox has a cache for the last 3 games. i wonder if it is simply cycling through the xyz partitions. Huh

edit: i just found this from a faq:
Quote:c is like on your computer, where the dashboard is installed (like windows).
d is your dvd-rom partition
e is your savegames partition
f is you extra 2gb if you got a retail harddisk partition, or the huge partition if upgraded hdd.
x / y / z are cache partitions, so games can cache data to load faster.

i wonder if xyz still could be used?
Reply
#35
Thumbs Up 
i've changed the cachemanager to cachedhttp as bernd suggested and made the following changes..

* you can now flush the entire cache with .cleancache(0)
* removed the concept of a permanent dir. (you can specify an alternate location in the urlretrieve function)
* made it chrash proof if metafiles are malformed.

i've upped it to http://www.xbmcscripts.com (ooba v0.8)
Reply
#36
Star 
(phunck @ april 26 2005,08:11 Wrote:i wonder if xyz still could be used?
fyi, i believe that all xbox-game has the right to format (and use) x, z and z (without the users concent or interaction), so it's ok for you all to use them but keep in mind that any data written to those partitions might get formated/deleted at any time without your or the users control, if you instead want the data to be kept and only deleted/flushed on purpose by a user or script then better to create a new subfolder under "e:\udata\0face008\" (xbmc user-data like games-saves that can be backuped by a user in ms-dash) or alternativly create a new subfolder under "e:\tdata\0face008\" (xbmc application data that cannot be be backedup by ms-dash, ...but niether can it be deleted/flushed like one of the cache partitions without a users/script interaction).
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#37
i want to get startet on this again as well.. how do i read in a webpage first before i can extract the data via regular expression.. can someone post a short hint?
Reply
#38
(fluidman @ april 26 2005,23:48 Wrote:i want to get startet on this again as well.. how do i read in a webpage first before i can extract the data via regular expression.. can someone post a short hint?
import urllib

f = urllib.urlopen("[url]http://www.site.com"[/url])
data = f.read()
f.close() #not really neccessary

the web page is now in the "data" object. to view it do "print data". good luck.
xbmcscripts.com administrator
Reply
#39
(gamester17 @ april 26 2005,14:49 Wrote:fyi, i believe that all xbox-game has the right to format (and use) x, z and z (without the users concent or interaction), so it's ok for you all to use them but keep in mind that any data written to those partitions might get formated/deleted at any time without your or the users control,
so a folder on x: seems to be the perfect place Cool for the http cache. i don't care if it may be deleted ... its just a cache.

phunck what do you say?

and when the default cache folder has been chosen.
can you make cachedhttp.py (alone) available to the public?
maybe on xbmcscripts and adding a link in the convenience scripts thread.

i think there was someone on another thread asking about "how to read html pages" from python

bernd
Reply
#40
i made a new category called modules on xbmcscripts. it's in there now Smile

the script looks great, but i have some few questions...

i am not looking for any cache as i don't need it, but how exactly does the cache work? i assume it check for code "not modified" from server and if that is returned the cache is used? is there any times when a web page could have been updated that it would still use the cache?

also, would it be possible to include a function which returned percentage of the download completed? in my humble opinion the xbmc dialogs shouldn't be in there since it limits what the user can do with the module. say, you wanted a label that just said how far the download was completed your module wouldn't be that easy to use. you could include the dialog example in the example script instead.

anyways, keep up the excellent work Smile
xbmcscripts.com administrator
Reply
#41
i changed it to x:\...\ and upped it to xbmcscripts:

http://www.xbmcscripts.com/modules....p%20v05
Reply
#42
thanks :lol:
Reply
#43
(enderw @ april 27 2005,09:48 Wrote:i am not looking for any cache as i don't need it, but how exactly does the cache work? i assume it check for code "not modified" from server and if that is returned the cache is used?  is there any times when a web page could have been updated that it would still use the cache?
that is exactly how it works. the cache will only be used if it receives a 304 status code. so, even if you don't need the cache you can use this script.

i'm planning to make a manual override, whereby you can force it to use the cache (if available). that is useful if you have some almost static huge pages that take a long time to download. example: http://www.tvtome.com/tvtome/servlet/listshowsservlet/

(enderw @ april 27 2005,09:48 Wrote:also, would it be possible to include a function which returned percentage of the download completed?
that is already implemented. the baseclass (cachedhttp) does not have a progressbar but calls instead ondatareceived and ondownloadfinished...

if you want to have the progressbar then you need to use cachedhttpwithprogress instead... (which inherits from cachedhttp). you can check that class to see how you can use the overrideable ondatareceived and ondownloadfinished events.
Reply
#44
instead of using urllib to getting the data you could use cachedhttp.py from http://www.xbmcscripts.com . there is an example of how to use it in the package. the main problem with urllib is that it can freeze if it the site is down. (there is no timeout)
Reply
#45
(bernd @ april 24 2005,07:43 Wrote:-watch out for the greedy .* pattern
i never did understand the greediness stuff which is why i do it my way. my way more closely relates to how i would program a parser and is unambiguous in what can be in the tags.

though it doesnt look quite as pretty and is not extensible to more complicated structure. i think its fine for html though...
Reply

Logout Mark Read Team Forum Stats Members Help
Looking for the ultimate HTTP fetch function0