2013-02-05, 15:23
Hi folks,
I'm pretty new to Python and XBMC development but do have some programming knowledge.
i'm trying to build an addon which will scrape http://www.falkirkfc.tv with a legitimite, working, username and password.
The login form on the site contains two hidden fields which act as tokens. One is randomly generated on page load and acts as a field name, it's value is 1. The other is static and is the value of a field whose name is 'return'.
After passing the tokens at login, as well as other data, it still wouldn't log me in to the home page. I then realised that cookies were being passed in the POST request.
Once i'd passed the cookies, login was successful.
Here's where i'm stuck.
Whenever i try to scrape a restricted page on the site after logging in, it logs me back out and outputs the original welcome page.
I've been using urllib and my code is as follows:
If anyone could help me better understand where I am going wrong i'd much appreciate it.
Thanks in advance.
I'm pretty new to Python and XBMC development but do have some programming knowledge.
i'm trying to build an addon which will scrape http://www.falkirkfc.tv with a legitimite, working, username and password.
The login form on the site contains two hidden fields which act as tokens. One is randomly generated on page load and acts as a field name, it's value is 1. The other is static and is the value of a field whose name is 'return'.
After passing the tokens at login, as well as other data, it still wouldn't log me in to the home page. I then realised that cookies were being passed in the POST request.
Once i'd passed the cookies, login was successful.
Here's where i'm stuck.
Whenever i try to scrape a restricted page on the site after logging in, it logs me back out and outputs the original welcome page.
I've been using urllib and my code is as follows:
Code:
def LOGIN():
USERNAME = settings.getSetting(id="username")
PASSWORD = settings.getSetting(id="password")
#GET SITE COOKIE AND TOKENS
CJ = cookielib.CookieJar()
COOKIEHANDLER = urllib2.HTTPCookieProcessor(CJ)
OPENER = urllib2.build_opener(COOKIEHANDLER)
REQ = urllib2.Request(LOGINURL)
REQ.addheaders = [('User-agent', USERAGENT)]
RESPONSE = OPENER.open(REQ)
LINK=RESPONSE.read()
RESPONSE.close()
TOKEN1=re.compile('<input type="hidden" name="return" value="(.+?)" />').findall(LINK)
TOKEN2=re.compile('<input type="hidden" name="(.+?)" value="1" />').findall(LINK)
#ADD THE OTHER COOKIES
c1 = cookielib.Cookie(version=0, name='sb_username28', value=USERNAME, port=None, port_specified=False, domain='www.falkirkfc.tv', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None})
c2 = cookielib.Cookie(version=0, name='sb_url28', value="http%3A%2F%2F", port=None, port_specified=False, domain='www.falkirkfc.tv', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None})
OPENER.addheaders = [('User-agent', USERAGENT)]
LOGINDATA = urllib.urlencode({'username' : USERNAME, 'password' : PASSWORD, 'return' : TOKEN1[0], TOKEN2[0] : '1', 'Submit' : 'Log in', 'option' : 'com_users', 'task' : 'user.login'})
RESPONSE = OPENER.open(LOGINURL, LOGINDATA)
print RESPONSE.read() #OUTPUT SHOWS ME SUCCESSFULLY LOGGED IN
def INDEX(url):
REQ = urllib2.Request(url) #THE URL I'M PASSING IS TO A PROTECTED PAGE
REQ.addheaders = [('User-agent', USERAGENT)]
RESPONSE = urllib2.urlopen(REQ)
LINK=RESPONSE.read()
RESPONSE.close()
print LINK #THIS OUTPUT TAKES ME BACK TO THE HOMEPAGE, LOGGED OUT
MATCH=re.compile('<div class="show-title-container"><a href="(.+?)" class="show-title-gray info_hover"> (.+?)</a></div>').findall(LINK)
for href,title in MATCH:
href = ROOTURL + href
addDir(title,href,2,'')
If anyone could help me better understand where I am going wrong i'd much appreciate it.
Thanks in advance.