2013-05-13, 21:24
Im trying to create an addon as a project whilst I learn the fundamentals of python but am having trouble with the regex, (I think)
I am getting most of the print as I require but am having a few print out incorrectly and cant see why, any pointers would be warmly recieved.
excuse the mess above.
I am getting most of the print as I require but am having a few print out incorrectly and cant see why, any pointers would be warmly recieved.
Code:
import urllib2
import re
def OPEN_URL(url):
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link = response.read()
return link
link=OPEN_URL('http://thenewboston.org/list.php?cat=36')
#(gets desired url)
#path = re.compile('href="(.+?)"').findall(link)
#for url in path:
# url = 'http://thenewboston.org/' + url
# if 'watch.php' in url:
# print url
#and,
#(gets desired title)
#path = re.compile('>(\d+\s-\s\w+\s+\w+\s*\w*)').findall(link)
#for title in path:
# print title
match =re.compile('href="(.+?)">(\d+\s-\s\w+\s+\w+\s*\w*)').findall(link)
for url, title in match:
url = 'http://thenewboston.org/' + url
title = str(title).replace('\n', '')
if 'watch.php' in url:
print title
print url
#Tried capturing title (bad regex?) 4,6,10,19,37 not printing correctly
excuse the mess above.