first attempt
#1
Im trying to create an addon as a project whilst I learn the fundamentals of python but am having trouble with the regex, (I think)
I am getting most of the print as I require but am having a few print out incorrectly and cant see why, any pointers would be warmly recieved.
Code:
import urllib2
import re

def OPEN_URL(url):
    req = urllib2.Request(url)
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
    response = urllib2.urlopen(req)
    link = response.read()
    return link

link=OPEN_URL('http://thenewboston.org/list.php?cat=36')
#(gets desired url)
#path = re.compile('href="(.+?)"').findall(link)
#for url in path:
#    url = 'http://thenewboston.org/' + url
#    if 'watch.php' in url:
#        print url
#and,

#(gets desired title)
#path = re.compile('>(\d+\s-\s\w+\s+\w+\s*\w*)').findall(link)
#for title in path:
#    print title
match =re.compile('href="(.+?)">(\d+\s-\s\w+\s+\w+\s*\w*)').findall(link)
for url, title in match:
    url = 'http://thenewboston.org/' + url
    title = str(title).replace('\n', '')
    if 'watch.php' in url:
        print title
        print url
#Tried capturing title (bad regex?) 4,6,10,19,37 not printing correctly

excuse the mess above.
Reply
#2
This should get you started in the right direction:
Code:
import re
import urllib2


def OPEN_URL(url):
    req = urllib2.Request(url)
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
    response = urllib2.urlopen(req)
    body = response.read()
    return body

html = OPEN_URL('http://thenewboston.org/list.php?cat=36')

#first, let's get the list separate from the page
ul = re.search('<ul>(.+?)</ul>', html).group(1)

#now lets find each item in the list
regex = re.compile('<li class="contentList"><a href="(.+?)">(.+?)</a></li>')

for item in re.finditer(regex, ul):
    link,title = item.groups()
    print title
    print link
Reply
#3
Hey Bstrdsmrk, thank you very much for the reply. Will keep me going for a while im sure.
Reply

Logout Mark Read Team Forum Stats Members Help
first attempt0