Looking for the ultimate HTTP fetch function
#1
Question 
since almost every script is using http
wouldn't it be possible to create one single fetchhtml function which can be used by all scripts?

it should provide the following:
-timeout for connecting
-timeout for reading data (function should return if the server hasn't send any data for a specified amount of time)
-user agent (like quicktimebrowser)
-some kind of progress callback so ui-updates/progress can be done

the timeouts are needed so the script won't hang if the server doesn't send anything or stops sending

user-agent is needed by qtbrowser and should be used by more scripts. i don't know how long the sites will tolerate us script users to view their content but not their ads Wink

since i don't consider myself experienced enough for this is anyone up for this challenge?
any additions?

maybe some script already has it implemented?

bernd
Reply
#2
i dont really consider http get complicated enough to warrant this.
python's http get is already pretty good and a threaded function that does this only takes only 6 or 7 lines of code...

i think most are happy copying those lines from all the other scripts that do the same thing as i dont think moving it to a seperate module saves people much (although it adds complexity).

though modifying scripts' user agents is a good idea and should be done for those that still identify themselves as libpython
Reply
#3
(bernd @ april 20 2005,17:13 Wrote:it should provide the following:
-timeout for connecting
-timeout for reading data (function should return if the server hasn't send any data for a specified amount of time)
-user agent (like quicktimebrowser)
-some kind of progress callback so ui-updates/progress can be done

the timeouts are needed so the script won't hang if the server doesn't send anything or stops sending

user-agent is needed by qtbrowser and should be used by more scripts. i don't know how long the sites will tolerate us script users to view their content but not their ads
i think bernd makes a good point...
a module could be built and updated as needed if/when improvements are made by the scripting community contributors.

thus, the improvements in the module would apply to all scripts using the module and improve all the scripts at once. no need to wait for the script writers to individually improve the specific script itself.

i see many scripts hanging and having issues that this could address.
I'm not an expert but I play one at work.
Reply
#4
can anybody please tell how you do the second kind of timeout (if the connection is open but no data has been sent for a while)...

my code looks like this:

Quote:try:
oldtimeout=socket.getdefaulttimeout()
socket.setdefaulttimeout(timeout)

request = urllib2.request(url)
request.add_header('user-agent',useragent)
opener = urllib2.build_opener() #should i only create this once? (global)
furl=opener.open(request)
fcache=file(localfile,'wb')
progress.create("downloading",url,"to: "+localfile)
data='...'
blocksize=8192
info=furl.info()
try:
totalsize=int(info['content-length'])
except:
totalsize=none
pval=0
while len(data)>0:
if totalsize is none:
progress.update(random.randint(0,100)) # just show that something is happening
else:
progress.update(int((pval % (totalsize+1))*100.0/totalsize))
data = furl.read(blocksize)
pval=pval+len(data)
if len(data)>0: fcache.write(data)
if len(data)<blocksize: break
if progress.iscanceled(): break
urlcontext[0]=furl.url
finally:
try:
if not fcache is none: fcache.close()
if not furl is none: furl.close()
socket.setdefaulttimeout(oldtimeout)
if progress.iscanceled(): os.remove(localfile)
except:
pass
progress.close()
Reply
#5
Lightbulb 
i'm currently implementing cache functionality in ooba... i could move that functionality to a seperate module (if anybody is interrested)...

it works like this: for each url that is retrieved an "u954ea4ad24ff07b0acc2fd4440c3b5bc" file is created (that is "u<md5ofurl>"). in this file there is the following informating: url, finalurlafterredirects, cachetime and localfile. cachetime=-1 means permanently in cache (or until it is manually cleared). note the amount on time that a file is stored in cache is determined by the caller (ie not by any headers).

the functions you can use are these:

* data=cachedurlopen(url,ext='')
opens the url from either the cache or the net. ext is simply passed on to cachedurlretrieve.

* localfile=cachefilename(url)

returns the localfile associated with url or an empty string if the url isn't in the cache

* localfile=cachedurlretrieve(url,ext='',cachetime=defaultcachetime,localfile=none)

here the ext parameter is the extention that should be used if the file (this is necessary because otherwise controlimage will only open images with the correct extention). cachetime is how long this file should stay in the cache. localfile is a way of overriding the automatic filenaming system. this function will pop up a progress bar so that the user may cancel the download.

* cleancache()

goes through the cache to check and deletes any files that are too old.
Reply
#6
(phunck @ april 21 2005,09:24 Wrote:can anybody please tell how you do the second kind of timeout (if the connection is open but no data has been sent for a while)...
retrieving data over http ist done in two stages:
1. make a connection with the remote host.
2. after the connection is established the actual data is send over this connection to the client

most scripts have a timeout for 1.
but i may also happen that you whish to download a 30mb file. set up the connection, transfer 10mb but then the server stops sending data. in scripts this usually shows as the progressbar doesn't step forward anymore. and we all know that cancelling the dialog doesn't work.
this is where the second timeout will trigger. normally you read data in chunks ie. 100k and the time between retrieving these chunks should be not be too long. otherwise the function should return with an error and not stall.

this is what meant with the second timeout.

maybe there is already something available in python.
i don't know.
but the main goal is not to let the script hang when the server is not reliable.

bernd
Reply
#7
cancelling the dialog dosent work?
why do you say that. it works to me.
have you remembered to end download function if iscanceled()??

or is the cancel not working only when the server isn't reliable?

http://home.no.net/thor918/xbmc/xbmcgui....iscanceled
Reply
#8
(thor918 @ april 21 2005,23:09 Wrote:or is the cancel not working only when the server isn't reliable?
not generally speaking, but when the script is hanging inside a read function due to a unreliable server it cannot check the iscanceled().
so to the user is looks like the cancel button isn't working properly.
checking iscanceled only works if the script is able to call it.

even though i must admit that i don't call it yet in my script Blush
i will add that on the next release.

bernd
Reply
#9
i just though what about proxies?

do python's http-functions take the http proxy settings from xbmc into account?

the docs say that they are read from environment vars. but i don't know wether the xbox even has envvars?

since i'm not using a proxy i cannot test it myself.
is anyone using a proxy with scripts?

if not would it be possible to use the xbmc proxy settings or must it be specified directly in the script?

bernd
Reply
#10
(asteron @ april 21 2005,07:07 Wrote:i think most are happy copying those lines from all the other scripts that do the same thing as i dont think moving it to a seperate module saves people much (although it adds complexity).
code reuse through copy and paste is uncool.
it is very frustrating (for users and developers) when you fix some bug. after that you fix them again and again because somebody (including myself) reused through copy and paste.
i develop software for living. and i fixed a lot of these issues :bomb:

even if you just put these 7 lines into a seperate module it would be a good thing.
maybe in the future python 2.5 will be integrated into xbmc and provides an even better/faster/cooler urllib. then you only have to change some lines in this module and all scripts would benefit.

and i also think there could be more utility functions in this module in the future (think about the generic parameter stuff discussed in the script package thread by enderw)

bernd
Reply
#11
yes i get what you mean by the second type of timeout... i just dont know how to code it (using urllib2)
Reply
#12
(bernd @ april 22 2005,00:12 Wrote:i just though what about proxies?

do python's http-functions take the http proxy settings from xbmc into account?

the docs say that they are read from environment vars. but i don't know wether the xbox even has envvars?

since i'm not using a proxy i cannot test it myself.
is anyone using a proxy with scripts?

if not would it be possible to use the xbmc proxy settings or must it be specified directly in the script?

bernd
look in the latest version of nrkbrowser for a proxy function (if you need an example that is). it needs to be specified in script however. i have no idea whether it works or not (i didn't make it...heh), but i haven't heard any complaints yet. perhaps it can give some ideas. quicktimebrowser use some kind of proxy thing too, but i can't say i understand it...haven't really bothered to look good at it either. an universal http fetch would be very nice to have btw, and coders who just knows enough to parse sites (like me Wink ) and not how to make advanced functions would gain from it.
xbmcscripts.com administrator
Reply
#13
i've cleaned up my code alot and made a cachemanager.py ....

features:
* easy to use
* respects the etag and last-modified headers
* automatically puts up a progress bar that can be cancelled during download.
* you can download to a permanent location where cachefunctionality
* extensions is based on mimetype and url extension
* you can get get access to headers and the final url after redirects.
* you can set user agent and timeout
* ...

i'm using it in my latest ooba, so you can get it from this zip: http://www.deviantart.com/view/16987987/. would you would please test it and come with suggestions and bugreports: (additions to the code are also very welcome.)

example:
Quote:import cachemanager

cachemgr=cachemanager.cachemanager()
cachemgr.createfolders() #create cachefolder and permanentfolder if needed
cachemgr.cleancache()

localfilename=cachemgr.urlretrieve('http://www.phunck.com')
localfilename=cachemgr.urlretrieve('http://www.phunck.com') # the second time it might come from the cache
data=cachemgr.urlopen('http://www.phunck.com') # might come from the cache

#now download something permanently:
localfilename=cachemgr.urlretrieve('http://www.phunck.com',-1)

#or specify where you want it (the extension will be based on mimetype)
localfilename=cachemgr.urlretrieve('http://www.phunck.com',-1,'f:\\videos\\tester')

i'm also thinking about some simple, in memory only, cookie support. but i likely wont bother if nobody will be using this. so please let me know if you plan to try it out.
Reply
#14
(phunck @ april 22 2005,16:08 Wrote:i've cleaned up my code alot and made a cachemanager.py ....
phunck your cachemanager.py looks really good by review. :thumbsup:
i will test it further on sunday (hopefully)

there are two little things that would make it even better:
1. separation of cache and httpfetch functionality
2. the progress dialog should be optional, or better passed as an argument


now there the question:

where should we place this (and maybe other) universal module?

should it go into python\lib folder or placed elsewhere, so it will be accessible to all scripts?
Reply
#15
Sad 
(enderw @ april 22 2005,13:38 Wrote:
(bernd @ april 22 2005,00:12 Wrote:i just though what about proxies?

do python's http-functions take the http proxy settings from xbmc into account?

the docs say that they are read from environment vars. but i don't know wether the xbox even has envvars?

if not would it be possible to use the xbmc proxy settings or must it be specified directly in the script?

bernd
it needs to be specified in script however. i have no idea whether it works or not (i didn't make it...heh), but i haven't heard any complaints yet.
so the proxy configuration must be set in the script  :hmm:

maybe someone from the dev team can comment on this.

is it possible for python scripts to read the proxy configuration made by the user in xbmc?
or does it do this automactically?

bernd
Reply

Logout Mark Read Team Forum Stats Members Help
Looking for the ultimate HTTP fetch function0